This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit

January 2008

Wiktionary:Beer parlour/2008/January/policies

translations between FLs of non-English phrases

There are a number of phrases that exist in translation among various foreign languages, but not (idiomatically) in English. For example, in Hebrew, a phrase meaning "you're welcome" is על לא דבר, literally "for nothing"; likewise, in French (from what I understand, not knowing French), there's ne rien, literally "nothing", for "you're welcome". In English, there's no real idiomatic counterpart to these: we say "oh, it's nothing", but that's not a set phrase, and rightfully is redlinked. Yet we (meaning I) would want a way to note that the French phrase and the Hebrew phrase are near-exact translations of one another. (Right now, we only have that they're both translations of "you're welcome". One can look up the individual words comprising the FL phrases and realize that they match up, but that's convoluted.) I suggest therefore that idioms such as these have Translations sections which will list only translations into other FLs which carry similar idiomatic meaning and are also near-exact literal translations — and only where English has no such idiom. (This would of course require a change to ELE and, so, a vote; right now, it's just an idea for discussion.)—msh210 19:35, 2 January 2008 (UTC)

That's an interesting thought. Personally, I'd tend to put that sort of information in an etymology section ('Literally “on not a thing”; compare French de rien (you're welcome), which literally means “of nothing”.') — but that's partly because I only include that sort of information when I think it might have etymological significance (since so many idioms have been borrowed by and from Hebrew), and partly because I don't know enough languages that I might fear overfilling the etymology section. :-P   Another useful exception to the no-FL-translations rule would be the case where multiple foreign languages have words for a concept without a CFI-meeting English-language entry, like Hebrew and French après-demain (the day after tomorrow). —RuakhTALK 03:34, 3 January 2008 (UTC)
Could this be adequately be handled with sufficient coordination with FL wiktionaries? Or possibly non-NS:0 pages such as Appendix pages? --Bequw¢τ 19:40, 3 January 2008 (UTC)
We haven't had the "cognate" discussion in a while; adding it as a heading met significant resistance in the past. (Some might say my objections were militant - perhaps so.) The addition of a "translations" section to a FL entry is still very strongly discouraged; there is no reasonable way that information can be verified here (whereas it can be verified on the FL wikt.) And making exceptions for only certain entries that have certain properties in some FLs is too precarious, to set a rule (only for certain exceptions) for. --Connel MacKenzie 22:26, 3 January 2008 (UTC)
Fair enough. It can be hard to bring newbies up to speed with even our most consistent policies; I won't be the one calling for open mayhem. :-P —RuakhTALK 04:17, 5 January 2008 (UTC)

"Abbreviation" L3 header

The standard, the way I understand it, is that we do not use the Phrase POS header when another does better; for example, if a phrase is a verb, then we list it as a Verb, not as a Phrase. I assume (though I haven't seen anyone say this) that the logic is as follows: It's obviously a phrase (count the number of words, and see that there's more than one), so use the Verb header to show that it's also a verb (which is not as obvious).

The same would seem to apply to Abbreviation. If something is an abbreviation, but also a noun, then we should list it as a Noun, and put {{abbreviation of}}, or something similar, in its definition line (or in its Etymology). That abbreviation info will suffice for people to know it's an abbreviation; and the Noun POS header will give the non-obvious info that readers need: this is almost completely analogous to the phrase case.

This is the way I've been doing it, and it seems the most reasonable way to me. Someone called me on it, though, so I seek public opinion.—msh210 23:16, 2 January 2008 (UTC)

That's not the way I do it. Rather, I would use ===Abbreviation=== and mark the part of speech at the head of the "definition" line as (noun) or whatever. An abbreviation is automatically not truly a lemma, since it is an abbreviation of something else. The something else will be marked for POS. I have done c. this way as an example. Also see WT:POS. --EncycloPetey 23:22, 2 January 2008 (UTC)
A minor point, but we should then have context templates, à la {{pos-n}}, for every part of speech. Right now I only see {{pos-a}} (adjectives), {{pos-vi}}, and {{pos-vt}}. In English alone we'll need a whole bunch more.—msh210 23:32, 2 January 2008 (UTC)
That point has been raised in previous incarnations of this discussion, and I agree with you. --EncycloPetey 23:34, 2 January 2008 (UTC)
I like the layout of c. it lets the Abbreviation header be used for what is clearly an abbreviation, and also allows for the grammatical information to be present for those who wish it to be there. Conrad.Irwin 00:55, 3 January 2008 (UTC)

Proposed vote on fiction concordances.

Pursuant to the discussion on Harry Potter above, and the RfV on "drider", I plan to call a policy vote on fictional words (other than proper nouns) that do meet the CFI but only within the context of a specific fictional universe, or discussions of that universe. Specifically, I intend to propose that all such words be banned from the main entry space, and instead included (if at all) in a universe-specific concordance along the lines of Concordance:A Clockwork Orange. If anyone has anything to say about this in advance of a vote, speak now or forever rest in peace. bd2412 T 03:26, 3 January 2008 (UTC)

Concordance:A Clockwork Orange doesn't correspond to any definition of concordance with which I'm familiar. Granted, the Concordance: space is miserably underused, but I'd prefer if we can limit it to actual concordances... I would suggest "Appendix: Glossary of Foo terms" as an alternative convention, with option to create subpages ("Appendix:Glossary of Foo terms/Bar") for individual terms where warranted. -- Visviva 04:48, 3 January 2008 (UTC)
It looks like there are actually two Clockwork Orange word lists: Concordance:A Clockwork Orange and Concordance:Nadsat lexicon. Mike Dillon 05:57, 3 January 2008 (UTC)
Most of our 'Concordances' (all?) are linked word lists, with no information about context, frequence, citations, or location. The concordances to Shakespeare and the Sherlock Holmes stories are particularly notorious. However, those are complete word lists, with all words that appear in those works. To propose a Concordance, I would want to see a word list that is likewise comprehensive. So, I agree with Visviva that the proposal sounds more like an Appendix than a Concordance. --EncycloPetey 04:53, 3 January 2008 (UTC)
Yes, I agree that Appendix space would be better given the rather narrow definition of concordance - the Clockwork Orange page should be moved to an appendix, actually. My point is to get these things out of mainspace and still have a place to warehouse them, to avoid disputes erupting over muggles and corbomite and banthas. bd2412 T 05:45, 3 January 2008 (UTC)
I don't see why these shouldn't be included as long as the citations are truly independent. I could see eliminating protocol droid on the grounds that they all refer to the same character, but that's a proper entity and easy to identify. "Fictional universe" does not necessarily imply a unique concept in that way; please elaborate on how specific it should be. DAVilla 07:58, 3 January 2008 (UTC)
I agree with keeping words where the citations are independent - but is a peer-reviewed study of a fictional character, for example, "independent"? For example:
  • 2003: Angela Jane Weisl, The Persistence of Medievalism: Narrative Adventures in Contemporary Culture, p. 200:
    The battle is described in the script: Luke ignites his lightsaber and screams in anger, rushing at his father with a frenzy we have not seen before.
The above citation is not independent of the Star Wars universe even though it is not occuring within that universe. Compare:
  • 2004: Les Pardew, Game Design for Teens, p. 71:
    With some of the modifications [to the World War II battlefied game, 1942], you can even play with a lightsaber, thus showing how one idea can branch into many others.
  • 2006: Maddy B., The Haunter of the Loch, p. 41:
    [After finding a glowing blade,] Brian being Brian, his first thought was of a lightsaber.
These latter two citations make no reference to Star Wars; they simply presume that the characteristics of a lightsaber are known to the reader. bd2412 T 14:19, 3 January 2008 (UTC)
Your wording in the proposal addresses my concern. DAVilla 15:16, 7 January 2008 (UTC)

Hearing no further objection, here it is: Wiktionary:Votes/pl-2008-01/Appendices for fictional terms. Cheers! bd2412 T 02:35, 5 January 2008 (UTC)

I am very in support of this, but I feel that there should be a way of linking from dictionary entries that both exist (in particular), and do not otherwise exist, to the correct Appendix, this would allow these to be found by those looking them up - to have the information is all very well, but it would be even better if the newbies can find it.
If the word otherwise exists, use this with the {see} template.
For the use of firebolt in relation to the Harry Potter universe, see our Harry Potter appendix
If the word doesn't otherwise exist
firebolt is a word invented for the Harry Potter series, for its definition see Harry Potter appendix.
Feel free to create an entry below for other uses of this word
Would this be sensible, useful or acceptable. Would perhaps a transcludable {{Wiktionary:Project-Noarticletext}} with a space for a link to the appendix be better in the case in which the page doesn't exist. Conrad.Irwin 17:01, 5 January 2008 (UTC)
Quite frankly, I think that is a problem for the search engine tweakers to figure out. If you look up the word and it's not in the dictionary proper, an appendix which actually contains it should be the first thing to come up. bd2412 T 20:21, 6 January 2008 (UTC)

Why we can not permit the "non-lemma" format to be considered standard

Because it gives users license to remove content.

See this edit and this one, note that the IP-anon added Usage notes to replace the definition that EP deleted.

In particular, it is sad and painful to watch Ric converting all of his excellent work to the dumbed-down stub format, especially when it will all have to be re-done eventually by someone. (At least they can refer to the history, perhaps "undo"? Would need to restore the templates.)

These are stubs. They are a fill-in for a proper entry; they are useful and many may remain this way; but a proper full WT:ELE form entry must never be replaced by a stub. The stub is not the desired form.

This removal of content is utterly, totally wrong. Robert Ullmann 15:02, 3 January 2008 (UTC)

Interesting to see this POV at work. What definition did I delete? None. The anon removed the definition when he made this edit, thereby converting the defintion into a redirect. I informed the anon that a redirect would be inappropriate, and after some going back and forth, we ended up with the current entry for lietuvių that identifies the form, points to the lemma, and provides Usage notes.
In other words, what actually happened was that the anon thought having a separate, duplicate version of the lemma definition on the inflected form was inappropriate and misleading, and wanted to point instead to the lemma form. So, educated anons agree that having the non-lemma simply duplicate the lemma is wrong. In this case particularly so, since the original definition "Lithuanian" is wrong. The word lietuvių does not translate as "Lithuanian" except when it is used to mean the language, which is not what the lemma means. The word lietuvių means "of (the) Lithuanians".
Net result: content was added, not removed. --EncycloPetey 17:46, 3 January 2008 (UTC)
For my clarification, would a good (though a bit minimal) non-lemma article (for the lemma/non-lemma distinguishing camp) be habla? It lists each grammatical inflection on a seperate definition line and gives example usage and translation. --Bequw¢τ 21:55, 3 January 2008 (UTC)
It's a start, but the Related terms section is not formatted correctly. It should either be at L4 under Noun, or placed following the Verb section at L3. Also, I would want to see a pronunciation and quotations added, at a minimum. I've made a couple of changes. --EncycloPetey 22:06, 3 January 2008 (UTC)
EncycloPetey, I don't see the original complaint as attacking you, if perhaps you do. The point is that encouraging "lemma mentality" is simply wrong. It leads to mis-perceptions like that anon's initial edit. --Connel MacKenzie 22:15, 3 January 2008 (UTC)
I would argue that the anon's initial edit was caused by Wikipedia's "redirect mentality", not by lemmatic thinking. Have you looked at the page for ? Do you really think it would be better to eliminate all the verb content and replace it with an exact duplicate of what appears on the page for , instead of having this form-specific content? To make that change would be a loss of information, which is actually what would be wrong. Lemmata have been used for centuries, and having a few people calling them "wrong" doesn't make it so. --EncycloPetey 22:29, 3 January 2008 (UTC)
That is not what is being proposed (anymore ;) what would be better is to have an entry, as informative as the "lemma" but for this particular inflected form. In order to do this properly, in a way that will work, we need to have an entry format flexible enough that "definitions" are not required. As has been said before, the "dis-inflection" of a word can be more important, more precise, and more useful that the definition. The reason that this format works for habla is because there is only one sense of the verb hablar. It wouldn't work for ran as there are multiple senses of run. The other thing that confuses me about this entry is why hablar is listed as a related term. It would also be possible to add the conjugation table from hablar to this entry with no ill effects. Conrad.Irwin 22:58, 3 January 2008 (UTC)
Actually, the RAE lists 20 senses for , we just don't have the other 18 yet. --EncycloPetey 23:00, 3 January 2008 (UTC)
Oh, and adding inflection tables presents a new problem as well, since a large proportion of the inflection table tempates rely on the PAGENAME to generate the inflection. Including inflections in non-lemmata would require first re-writing most of the current inflection table templates. --EncycloPetey 23:03, 3 January 2008 (UTC)
In which case the format at habla will not work at all and we need to find a better one. Perhaps by expanding the sentences to include some context would alleviate the issue, adding a gloss, or even removing them all together, but that would be a shame. Perhaps it would be best to put example sentences under usage notes in the cases that they are useful - though this would be messier. I doubt that many of the conjugation tables do use {PAGENAME} as it is unusual (in everything I have seen) to have the lemma form as a stem, this would in any case not be a long term issue at all. Conrad.Irwin 01:28, 4 January 2008 (UTC)
The verb conjugation templates don't use {PAGENAME} but the adjective/noun inflection templates do. --Bequw¢τ 13:51, 4 January 2008 (UTC)
But even in those cases, surely it is trivial to replace {{{PAGENAME}}} with {{{lemma|{{{PAGENAME}}}}}} (or what have you), so that the same table can be generated on an arbitrary page? -- Visviva 17:14, 4 January 2008 (UTC)
Bad content should be removed. We already remove lots of content that's bad, whether because it's inaccurate, or not neutral, or not verifiable, or not definitional, or in the wrong place, or simply because we don't want it (as with many brand names — a dictionary could include them, but we don't want to be a haven for spammers). In the case of non-lemma entries, "in the wrong place" is the relevant factor. —RuakhTALK 00:40, 4 January 2008 (UTC)

Robert, I've "dumbed down" my old stuff because after doing it long enough, I realized how messy it is to do things "your way". In particular with verbs. Of course I always have trouble thinking of specific examples, but the verb pleca comes right away. We don't have an entry for it yet, but it has two definitions. The first is "to leave". It's intransitive and works by itself. However, the other definition is "to vow". This is reflexive and would be written "a se pleca". Trying to format pleci would be a nightmare for me. If I only put in the definition for "pleci", you leave, it wouldn't be complete. But the definition for "te pleci" would have to go under another verb header, because you need to have the 'te' in there or it wouldn't be the same. uita is the same. Intransitively it means "to forget". Reflexively it means "to look at." So should we have separate entries for "pleci" (you leave) and "te pleci" (you vow), "uit" (I forget) and "mă uit" (I look at)? If you use one entry you'd have to point out the difference, but how? Look at User:Opiaterein/uit and try to tell me that's a good way to do it, especially when the subjunctive ONLY means "should" when it isn't used with other stuff. — [ ric ] opiaterein — 18:19, 4 January 2008 (UTC)

Concering User:Opiaterein/uit, look at how the definitions (which are good) don't match the example sentences. If you say "să mă uit" alone, it doesn't mean the same as if you added "vreau" to the front of the sentence. So if someone was trying to figure out what "să mă uit" means, because they saw it somewhere, without that example sentence they would think that "vreau să mă uit" means "I want I should look". So with the definition and the example sentence not matching, that just creates confusion. Better to have the form of information and then an example sentence. (I myself am lazy with example sentences, but I do try to throw them in once in a while, if I feel that they're needed.) — [ ric ] opiaterein — 18:23, 4 January 2008 (UTC)
It is not a good way of doing it at all. For cases such as that adding definitions makes the entry more complicated, however the example sentences do help. I have taken the liberty of making it look slightly tidier, though I am not sure my version is the best either. Is the non-reflexive sense also part of uita or is it actually from a different lemma? Conrad.Irwin 19:19, 4 January 2008 (UTC)
In the dictionary you would find it under "uita, a" rather than "uita, a se". So it would come within the section describing "uita, a". The thing I don't like about your formatting is that it wouldn't be the same between entries. Standardization is the key, otherwise newbie people won't know what to do. — [ ric ] opiaterein — 19:28, 4 January 2008 (UTC)
Also, with all that information on the head line without a # underneath looks strange. Also, most Romance languages have third-person singular present and second-person singular imperative forms that are they same. If you put that all on one head line, it'd be really long and clumsy. — [ ric ] opiaterein — 19:31, 4 January 2008 (UTC)
As per discussion (a long way) above, often the definitions confuse the issue rather than help to sort them out, therefore we need a format that is happy without definitions, but encourage the addition of them when necessary. I agree that all the inflection information wont always fit in one line, in which case two of more could be added. I also agree that having no # line looks strange, but that is only because we are not used to it yet. Conrad.Irwin 19:47, 4 January 2008 (UTC)
We need a format that works for all form-of entries, which is what we have. It is still completely possible to add definitions if they're necessary. It is more than completely possible to add example sentences if they're necessary. There's nothing wrong with the current format. See veştejiţi. — [ ric ] opiaterein — 19:51, 4 January 2008 (UTC)
Agree. Conrad, multiple inflection lines under a single POS cause a host of added complications. When those happen on a lemma page, it means that there are two forms that have different genders, and possibly even different inflections. But, you can't add the inflection section underneath each one, since that is a separate section, and therefore can't be inserted between them without violating the page structure. Expanding this problem to non-lemma pages would not be a good idea. (And I would rather see it not happen on the lemma pages, but people are resistant to Noun 1, Noun2, etc., so we're stuck with the problem for now) --EncycloPetey 04:17, 5 January 2008 (UTC)
I am not sure what you are saying, EncycloPetey, how would you markup the cases, such as User:Opiaterein/uit, where multiple inflection forms of the same word are present. To split that =Verb 1= =Verb 2= etc. would be wrong, as they are all the same verb. Conrad.Irwin 23:16, 5 January 2008 (UTC)
Some words have more than one declension. There are a few words in Romanian that have two meanings, and a plural for each. (an example would be vise and visuri, plurals of vis, but they mean the same thing. I can't think of any immediately with different plurals with different meanings.) Also, some form-of words are forms of different words. capete is the plural of cap and capăt. I don't do =POS1 = and =POS2=, I just repeat it twice. ===Noun=== ===Noun===, don't see anything wrong with it, really. — [ ric ] opiaterein — 01:52, 11 January 2008 (UTC)
Just repeating the heading is worse. Those are supposed to be broken out in the "multiple etymology" format, as per ELE, even if that means that the separate etymologies only indicate the difference by gender (more likely, there is a more difference than just gender.) Note that when your entry is parsed (whether by Conrad's fancy parser for the different "skin" views here, or ninjawords, or my offline parser, or Robert's, or Hippietrail's, or Patrick Stridvall's) either the first =Noun= section or the second =Noun= section is simply discarded. The only possible reasons for intentionally being so nonstandard, would be to prevent reuse, or to "break" the alternate page views. I don't see anything in your examples that needs to be hidden, so I'm left wondering why you'd intentionally want such things not to show up. --Connel MacKenzie 17:43, 11 January 2008 (UTC)
That's what I would do if the etymologies were actually different, which is something I have seen, but haven't added yet. At least I'm pretty sure I haven't. Anyway, I'm not sure what you mean by the rest of your message, so I guess I'll leave this at... that. — [ ric ] opiaterein — 23:34, 11 January 2008 (UTC)


Declension templates, opinion gathering

Any thoughts about this ugly mish mash of templates? I would like to try and make them all look more similar, though I appreciate that would take a very long time, and will almost certainly lead to arguments as to which ones are nicer. A lot of them are perfectly acceptable on their own, but when you see two or more on a page it can begin to look ugly and unprofessional. My preferred format would be something like the {{is-decl-noun}} with a [show] link in the heading. What do other people think? This is a minor issue compared to the much bigger ones of grammar versus meaning being discussed in other places, so a bit of light relief :) Conrad.Irwin 19:37, 4 January 2008 (UTC)

I'm not as worried about them looking different as some of them being downright ugly. :( But the height of some of the right-side-float ones can also be an issue. I'd rather all declension and conjugation templates go under a ====Declension==== header than float to the right. — [ ric ] opiaterein — 19:56, 4 January 2008 (UTC)
An then there's {{fr-infl-noun}}, which conflate it with a entry-top template (or however it's called), and the conjugation templates...
Seems like it would be best if we had a CSS class for all tables of this sort; hardly anyone seems to use the standard wikitable class for declensions and inflections, so we do definitely need something here. The show/hide issue can probably be decided case-by-case, although I think in most cases you're right that it should be hidden. -- Visviva 01:26, 5 January 2008 (UTC)
We should impose some basics, at least - layout (where the cases go relative to each other, and relative to categories such as gender and number); color schemes, font styles. bd2412 T 02:42, 5 January 2008 (UTC)
I prefer that different languages have different looks and styles for their templates, since it makes it much easier to know when you've found the language section you're looking for. The language headers are supposed to do that, but on a crowded page, they're not immediately visible. The various templates thus serve a function in being different. That's not to say that some standardization would be bad, in fact I'd like to see some. Personally, I dislike collapsible templates, except for verb templates and similarly long templates that drown out the other information. Many users coming here do not realize the tables are collapsed. Several times this week, I have had to help people who didn't understand what was going on with collapsible tables, so I'd rather not propogate them to places they don't need to be used. Translations tables, yes, because they grow long. Lengthy Related terms and Derived terms lists likewise. But for some languages and parts of speech, the inflection tables are short. Adding the complexity of collapsing them would, I think, be a net loss. And for highly inflected langugaes like Latin, I tink it would be downright detrimental.
As far as what BD2412 says, I agree mostly. There are standard ways that grammarians lay out inflection tables, and we should follow that as much as we can. However, some specific languages (and their textbooks) do things differently, and we should consider that as well. Color schemes, as I noted above, ought to be allowed to vary. I think the different colors are a plus (though we might work to simplify the vast array we currently have). --EncycloPetey 04:28, 5 January 2008 (UTC)
Further thoughts: I would like each template to have a "Declension of lemma" or similar heading, perhaps linking 'Declension' to an appendix where the language's grammar is described, in addition to linking to the lemma. In terms of style I would like to see each one having a surrounding border and a solid background - perhaps all the light grey that is used in several templates above. Conrad.Irwin 16:29, 5 January 2008 (UTC)
Two quick thoughts: (1) declension only applies to nouns and adjectives (and pronouns); the general term is inflection; the term for verbs is conjugation. (2) This isn't always necessary, since Latin has this information in the inflection line. For most other languages, no appendix exists explaining the grammar. I've pushed for these appendices many times, but they just haven't been written, so there wouldn't be any place for them to link to, in the vast majority of languages. --EncycloPetey 16:34, 5 January 2008 (UTC)
The conjugation tables are all fairly good at the moment (and all pretty much the same), which is why I brought up the declension templates. Obviously if the appendix doesn't exist or there is another link, then there is no point in linking to it. The reason I didn't want to use the word inflection is that it has become incorrigibly confused, on Wiktionary, with the "inflection line." Conrad.Irwin 17:39, 5 January 2008 (UTC)
Actually, red-linking the Appendix page that should exist is a good way to encourage its creation (in the correct place, no less.) --Connel MacKenzie 16:43, 7 January 2008 (UTC)
It is more important to have consistent headings, position and layout. If pushed I would say further that the detailed table design should be similar - if there must be differences between languages, let it just be the use of different colorways. —SaltmarshTalk 06:43, 6 January 2008 (UTC)
Though not really normalization, we should make sure they are understandable by the color blind. We could adopt w:Wikipedia:Colour (or parts of w:Wikipedia:Manual of Style) and use the online tools to test if they work. --Bequw¢τ 18:17, 6 January 2008 (UTC)

Postpositional pronomial forms

What is the correct header for Hungarian postpositional pronomial forms? Please see mögött (behind), the actual postposition, and mögöttem (behind me), one of its inflected forms. --Panda10 12:42, 5 January 2008 (UTC)

I'm not sure we've had to deal with this situation before. I suppoose you could call it a Postposition form, just as we have noun forms and verb forms, but I'm not sure that would really capture its grammar. Note that this would mean the POS header would still be "Postposition"; only the category name would be different. --EncycloPetey 15:37, 5 January 2008 (UTC)
Thanks for creating the new category. --Panda10 19:30, 5 January 2008 (UTC)
  • Actually Spanish has a small closed set of words like this: conmigo, contigo, consigo. I think I've seen at least one in Italian too. In Hebrew all prepositions are inflected by the addittion of personal pronoun clitics. Hungarian only has a couple of postpositions though doesn't it? — Hippietrail 21:38, 5 January 2008 (UTC)
  • No, Hungarian has many postpositions; Hungarian uses them instead of prepositions. There are quite a lot of them. The examples you're giving from Spanish are Contractions of a preposition and pronoun, and they function as adverbs. I'm away from my books right now, so I can't remember whether the Hungarian forms are appending the pronoun to the postposition, or appending an associated pronomial ending. --EncycloPetey 23:29, 5 January 2008 (UTC)
Re: Spanish: Not necessarily as adverbs; in general, *, *, and *, such that we have, for example, "el problema conmigo". Re: Hungarian: A pronominal ending, that vary somewhat from preposition to preposition, if you believe Wikipedia. (See wikipedia:Hungarian grammar (noun phrases)#Postpositions with personal suffixes.) —RuakhTALK 00:11, 6 January 2008 (UTC)
My Italian Babel is holding steady at 0, but I believe it has ,
(deprecated template usage) teco (with you)
, and
(deprecated template usage) seco (with him/her/oneself)
, cognate with the latter two syllables of each of your Spanish examples, respectively. (The Spanish forms are inherently redundant, in that the and both descend from Latin .) —RuakhTALK 21:50, 5 January 2008 (UTC)
A fair number of languages have something like this, including Hebrew and Arabic (although these are prepositional, not postpositional). For instance, Hebrew אל or Arabic ل‎. —Stephen 00:49, 6 January 2008 (UTC)


Whilst assembling the Help:Index I have come across an anomaly: our Help:Reverting points to Help:Reverting on meta-wiki, which refers to the Three-revert rule on Wikipedia - yet our page Wiktionary:Three-revert rule is indicated to be a rejected policy. What is the newbie to make of this? —SaltmarshTalk 07:02, 6 January 2008 (UTC)

A lot of the help pages are in need of a rewrite to make them useful for wiktionary, which is often very different from "standard" wikimedia/mediawiki. I believe that Wiktionary:Three-revert rule is correct, and so it is not a policy, or even a guideline on Wiktionary - the very few cases where there have been revert wars on Wiktionary have presumably resolved themselves by other means. Conrad.Irwin 11:19, 6 January 2008 (UTC)


An anonymous user (User: claimed in April last year that "meme" is pronounced /mEm/ (rhyming with "them") in US English and /mi:m/ (rhyming with "theme") in UK English, updating meme and the corresponding rhymes pages. Both Wikipedia and give /mi:m/ only. The page for meme was rolled back, but not the rhymes pages. Presumably the user was mistaken, but can someone (preferably a US English-speaker) confirm this? I note that User:Hamaryns added "(UK)" to the rhymes page for -i:m, so it is possible that Hamaryns is also the anonymous user. — Paul G 08:49, 6 January 2008 (UTC)

I rhyme it with "theme", but with Internetisms it's always possible there's variation, as different people try to figure out how to pronounce what they're reading. :-P   ("Meme" isn't originally an Internetism, but I think most people know it as one.) —RuakhTALK 14:32, 6 January 2008 (UTC)
I think Richard Dawkin's intention, when he coined the word, was for it to rhyme with gene as his intention was to draw a parallel. - Algrif 15:13, 6 January 2008 (UTC)
Indeed, the OED quotes his 1976 The Selfish Gene as saying:
The new soup is the soup of human culture. We need a name for the new replicator, a noun which conveys the idea of a unit of cultural transmission, or a unit of imitation. ‘Mimeme’ comes from a suitable Greek root, but I want a monosyllable that sounds a bit like ‘gene’. I hope my classicist friends will forgive me if I abbreviate mimeme to meme... It should be pronounced to rhyme with ‘cream’. Examples of memes are tunes, ideas, catch-phrases, clothes fashions, ways of making pots or of building arches.
But the question isn't Dawkins's intention, but rather how people actually say it.
RuakhTALK 15:34, 6 January 2008 (UTC)
It's the way Dawkins and his team of supporters say it. Is that valid? BTW I'm not a Dawkins supporter personaly, but even I say the word like "cream". I would also state (POV) that I am very hard put to think of an English word ending in "e+consonant+e" that isn't pronounced that way. - Algrif 15:41, 6 January 2008 (UTC)
Outside of linguistics (, , , etc.), the only words I can think of ending in are (French loanword, rhymes with "them") and (no clue how it's pronounced — "TRI-ruh-ME", maybe?). Of all of these, is certainly the best-known, to the point that I can easily imagine people thinking comes from French même (same) and pronouncing it accordingly. —RuakhTALK 17:08, 6 January 2008 (UTC)
TRI-reme rhymes with meme. -- Thisis0 17:08, 11 January 2008 (UTC)
Extreme comes to mind. And monotreme. bd2412 T 16:32, 7 January 2008 (UTC)
Who could forget theme, scheme, and supreme! Aw heck, or all the words on this list? -- Thisis0 18:50, 11 January 2008 (UTC)
Either I'm from the UK or anon was just confused, thinking capital E represents "long E" (as we say in US English). DAVilla 15:04, 7 January 2008 (UTC)
It has an intellectual user base, but I still don't think anyone in US would think of a French pronunciation for this (creme). I don't hear it spoken often, but would expect the vowel to sound something like the "e" in gene and to rhyme with "extreme". DCDuring 18:03, 7 January 2008 (UTC)
Hmm. I guess it depends. Commercially, "creme" is used as a fancier spelling of "cream", and in that use it's pronounced the same as "cream"; but in "creme de la creme", "creme brulee", etc., it rhymes with "them" unless someone's trying to make ears bleed. :-P —RuakhTALK 03:39, 8 January 2008 (UTC)
I would argue that it has no pronunciation, but is merely a written word. How many times have you used 'meme' in conversation? ;-) RSvK 05:33, 25 January 2008 (UTC)

Parts of speech of reserved words in computing languages

I'm not sure whether we've decided to include computing languages in the remit of "all words in all languages" (at least, I haven't seen that discussion, but I haven't been around much lately so could well have missed it), but I think we have a problem: does it make sense to say that reserved words in computing languages have a part of speech?

If so, then many reserved words in computing languages inherit their part of speech from English (eg, POKE and GOTO in BASIC function like verbs, "while" in C/C++ functions like a conjunction, and "const" and "void" in C/C++ and "virtual" in C++ function like adjectives. But I think we run into trouble with other reserved words that have a more subtle syntactic relationship with the surrounding content.

I'm thinking of REM (used in BASIC to prefix a comment, that is, text that is there only as a note for humans to read and that is to be ignored by the computer). It is marked as a noun in the entry REM, and this is correct if it is allowed to inherit its part of speech from the English word "remark", which is a noun. If REM is indeed a noun, however, then how does that fit syntactically with the comment that follows? It might make more sense to treat it as a verb; then "REM this line of code is run twice" would "translate" into English as "Remark that this line of code...".

Or perhaps it is simply the case that it doesn't make sense to attribute parts of speech to reserved words, because the syntax of computing languages does not work in the same way as it does in natural languages. Do we then omit the part of speech for reserved words, or artificially try to shoehorn them into the parts of speech designed for natural languages, or create some would-be part of speech (like "Reserved word") for these entries? — Paul G 09:40, 6 January 2008 (UTC)

I am all for including them, along with the Airport codes above, and all other technical terms. It may be that, to gain consensus, these words will have to be relegated to an appendix, but I sincerely hope we can find a way of including these definitions in the main namespace. Conrad.Irwin 11:15, 6 January 2008 (UTC)
Fair enough - if we are to do that, it needs to be discussed, because "all languages" may or may not be intended to include computing languages. However, this is not the issue here. — Paul G 11:21, 6 January 2008 (UTC)
I'm pretty sure that I remember a discussion some years ago, in which someone made the decision that we don't accept programming languages as actual languages (possibly because they don't have an ISO code). However, if you ever want to put it to the vote, I would support their inclusion. Of course, quotations of their use would have to come from published programs, not from reference manuals of syntax. (Oh, and I am thinking that they might all be verbs (maybe the imperative form?). SemperBlotto 11:40, 6 January 2008 (UTC)
I think REM is an interjection, since it doesn't seem to be an imperative verb, and it forms the entirety of its utterance; whatever follows is simply throat-clearing, and not actually part of the program, and therefore not actually part of the utterance. :-)   Seriously, though, I'm not sure it's worthwhile to classify computer-language keywords by how they're used in the programming language; more meaningful is how they're used in English. "while" is usually used as an adjective or an attributive noun (I'm not sure which) in phrases like "while loop"; "rem", if it's used, is probably used as a noun in sentences like "always include lots of rems so people can understand your code, because you're a BASIC programmer, so it's a given that you don't know how to write self-documenting code" and as a verb in sentences like "I rem'd out these three statements, because it doesn't seem like they can ever be executed anyway" (though of course we'd need to make sure these uses are attested before we included them). —RuakhTALK 14:46, 6 January 2008 (UTC)
Paul, I too dislike "shoehorning" them into regular English parts of speech. Where Ruakh indicates they also have entered the English language proper, it makes sense to have them listed as such, but I think something like ===Keyword=== might work for specific programming language descriptions. Like SB, I would very strongly support a policy proposal to allow programming languages as entries. I think that pretty much all of my personal "deletionist" opinions stem from that prohibition - if it is finally lifted, I personally would have no tenable argument against proper nouns, promotional or not. (Hippietrail pointed out that some restrictions are needed: entries written in binary, for one. And guidelines on punctuation would need to be reinforced, as we would want an entry for html but not [[<html>]].) Since these often are not case-sensitive (but Wiktionary is) we'd need to decide if they should be all uppercase, all lowercase or what. We'd also need criteria on which programming languages to include (presumably starting with ANSI standard languages, branching out from there.) Is "wikisyntax" a language? Lastly, we'd have to decide what language heading to use for each one, as ==Programming== is not specific enough. Language headings like ==Programming: C++== seem particularly problematic. --Connel MacKenzie 20:59, 10 January 2008 (UTC)

UK pronunciations

I'm aware that I've brought this up before, but I'm increasingly uncomfortable with our use of RP. Anyone interested in pronunciation sections please have a look at Wiktionary_talk:Pronunciation#UK_pronunciations and let me know any thoughts. Widsith 10:18, 8 January 2008 (UTC)

Template:nav and topic categories

At the moment, when {{nav}} is used to create topic categories in foreign languages, those categories are automagically included in the parent category for that language. This has the effect of filling up the top category with lots of topic categories and making it harder to find the categories for parts of speech etc.: see Category:French language or Category:Spanish language for examples. The *Topics category for these languages is often left virtually unpopulated.

I propose changing {{nav}} (which is protected, for obvious reasons) so that topic categories in the Xish language (ISO: xx) are placed in Category:xx:*Topics if that category exists, and only placed in Category:Xish language if there is no dedicated *Topics category (smaller languages).

Any comments or objections? Physchim62 12:28, 8 January 2008 (UTC)

Actually, I favor a stronger simplification: keep only the default topical categorty and "parent" (which can be defined as *Topics), and drop the "default language" category (either "xish language or xx:*Topics" in yourproposition) altogether if a "parent" is defned, because if a topic is in xx:Society, putting it in xx:*Topics seems to defeat the idea of having a category structure to begin with. Circeus 21:21, 8 January 2008 (UTC)
I agree, that is the subject of a second proposition immediately below! :) I split the two propositions because I think that the cleaning out of the top language categories is more urgent. Physchim62 12:56, 9 January 2008 (UTC)
I think this is a good idea, it will help to keep things neater. If no-one objects I will do this in a couple of days, I don't think it is a drastic enough change to warrant a formal vote. Conrad.Irwin 20:18, 23 January 2008 (UTC)

Category tree for topic categories

A related suggestion to the one above, but one which has received less support in private discussions, so I shall phrase it as a question: do we want to have all the topic categories listed in *Topics, or merely the top level categories (eg Category:xx:Sciences)? Physchim62 12:28, 8 January 2008 (UTC)

Merely the top level ones. *Topics is intended to be the root of a topical category tree. The structure of that tree should be parallel across all languages for which the English Wiktionary has categories. --EncycloPetey 02:23, 11 January 2008 (UTC)
Not all languages will need all categories, but if we can keep a single category tree it would very much help linking with other WMF projects (eg, Commons). Physchim62 12:58, 11 January 2008 (UTC)

Why do etymology templates link to Wikipedia?

Why do templates such as {{L.}} and {{F.}} (in this table and category) link to Wikipedia language articles rather than our own entries on those languages? The wiktionary entries include links to the Wikipedia articles anyways, and can provide helpful links to our language Appendices. I could see Wikipedia-linking if we don't have a sufficient article for the language, but is that still the case? And if so shouldn't we fix OUR articles instead? --Bequw¢τ 13:03, 8 January 2008 (UTC)

I always assumed it was due to disambiguation. The Wikipedia articles are solely about the language, whereas the Wiktionary ones are about other things as well, so it could be misleading. Although it is obvious for me, and presumably other Wiktionary editors, that if I see a word's etymology is from Latin or French, a link to the Wikipedia language article makes it certain. Plus, the link to the Wikipedia article gives more information. A link to e.g. w:Malayalam language gives far more extra information than a link to Malayalam. I think a Wikipedia link and automatic categorisation works very well. This way all words stemming from the same language are together, and the reader can easily find, with one click, more information about the language and all its history, phonology etc. --Keene 13:44, 8 January 2008 (UTC)
Yes, and on a further note, I believe we should work to have more interoperability and cross-linking where useful. As sister projects we should not practice isolation, or improving our pages beyond the need to linking to Wikipedia pages which already contain the information. I realize human instinct leads a community to territorialism, protectivism, hoarding, etc. - but asking "Why does a thing link to Wikipedia" reveals a flawed way of thinking that, I think, should instead be continually more inclusive and cross-utilized. -- Thisis0 17:50, 8 January 2008 (UTC)
As Keene has noted, a link to a specific Wikipedia article eliminates possible ambiguity. The WP article identifies the language family, geographic distribution, and the history of the language. It is a more logical place to link for someone who is uncertain about the identity of a language. --EncycloPetey 02:22, 11 January 2008 (UTC)

Yiddish words in Latin characters, revisited

I've brought this up before, and so have others, but there are a number of entries in Latin characters with L2 header "Yiddish". (Yiddish is written in Hebrew characters, so this is an error.) One suggestion was to label them English; another was to keep them as are; another was to label them Yiddish and note that they are transliterations. I'm writing now to suggest another solution and see what people think of it. There's an ISO 639 language (code yib) called Yinglish, which is, according to Ethnologue, "a variety of English influenced by Yiddish (lexically, particularly, but also grammatically and phonetically)". (See also WP's article on Yinglish, although it's not written that well imo. And note also that we list Yinglish on the list of languages.) I suggest that we list these words as Yinglish, but don't want to make that change without some community approval. Your thoughts?—msh210 18:45, 10 January 2008 (UTC)

But see [1].—msh210 19:53, 10 January 2008 (UTC)
That link says that Yinglish was dropped from ISO 639-3 as a language, since it's mutually intelligible with English (just some variances in vocabulary) and is thus a dialect of English. So I guess we should just list these as English, then?—msh210 17:14, 5 February 2008 (UTC)
Personally, I'd like some sort of general-purpose way of dealing with terms that are kind-of between languages, like and (Hebrew acronyms transliterated into English letters but still punctuated and pronounced Hebrew-style), or and (foreign-style phrases coined and used exclusively or primarily by English-speakers), or and (French terms borrowed into English but still often written French-style), or … well, you get the idea. We need some sort "miscellaneous" language header or something. :-P   Failing that, "Yinglish" sounds good. :-)   —RuakhTALK 02:46, 11 January 2008 (UTC)
Yinglish seems like it is a perfect example of an in-between category. Surprising that it is deemed a language. For the general case where ISO doesn't have a language, how about a master "tweener" (Please substitute good name.) category, with subcategories for the second language involved (if that is even needed because other language should have its header and entry) or for the nature of the "betweenness"? In the case of a multi-language entry where only two languages shared the same sense, we might need something else below language level, below PoS level, at sense level. DCDuring 03:51, 11 January 2008 (UTC)
Really, "Yinglish" works quite well, I think. It definitely covers some of those words, which we could also add spelled properly in Hebrew letters --Neskaya talk 21:22, 19 January 2008 (UTC)

How is this different from something like Japanese rōmaji (which have "Japanese" as a language header)? Mike Dillon 05:20, 11 January 2008 (UTC)

IMO, it isn't, and no such case should be tolerated. But there is entrenched support for romanized Japanese and Mandarin entries here. -- Visviva 01:18, 12 January 2008 (UTC)

Hyphenation and IPA standards

What is the standard for hyphenation and IPA? I've seen different formats in other FL entries. The following example can be seen at magas. Please advise if this is correct:

*Hyphenation: ma·gas
* {{IPA|/ˈmɒgɒʃ/|lang=hu}}

Other variations I've found: hyphenation using a larger bullet to separate the syllables, IPA containing a language attribute.

*Hyphenation: há•zon

Thanks. --Panda10 13:21, 11 January 2008 (UTC)

I personally think the lang= should always be used, otherwise it will link to the entry for English phonology, which is useless if the template is being used to show the pronunciation of a word in any other language. The // vs [] is something else, though. I think they're supposed to mean something, but I don't remember what it is. I always use [], except for languages that I've only seen using // like French, just because I think it looks better. — [ ric ] opiaterein — 15:41, 11 January 2008 (UTC)
Woot [2] — [ ric ] opiaterein — 15:46, 11 January 2008 (UTC)
Thanks for the information, Opiaterein. Based on this, the better option for IPA would be the second of the above two variations (which is IPA|[]|lang=hu). For hyphenation, it seems Wikipedia uses . (period). At least this is what I've seen just above the Brackets table on the IPA page when I clicked on the link you provided. A period would be easier for me to enter. I will wait for more feedback as to which one to go with. --Panda10 21:04, 11 January 2008 (UTC)
The best way to do hyphenation is to use {{hyphenation}}, like this:
This way, you merely type a pipe instead of some strange dot character. The difference between // and [] in the IPA has to do with whether the pronunciation given is phonemic (broad) or phonetic (narrow). If the pronunciation is broadly explained, and not overly precise, then // should be used. If the pronunciation transciption is precise and specific (such as for a particular regional dialect), then [] should be used. --EncycloPetey 02:05, 12 January 2008 (UTC)
This is very helpful. Thanks. --Panda10 02:20, 12 January 2008 (UTC)
A dictionary should have very few occasions to use phonetic transcriptions, incidentally. We should be aiming to generalise as much as is sensible, and allophonic and personal variation are not relevant, IMHO. --Wytukaze 20:10, 15 January 2008 (UTC)

Lingua Franca Nova

I'd like to know others opinions about Lingua Franca Nova, I wasn't aware of the restriction on adding non-coded alt-langs to Wiktionary and I would still really like to add it. So, please be honest and realize I'm not militant, just really encouraged by how nice of a language LFN is...well, to me at least. I am told it will also need a WMF language code (e.g. art-lfn), so any help with making this happen would be much appreciated.

Thanks for reading. --Sano 15:58, 11 January 2008 (UTC)

As a constructed language with fewer than 200 speakers, it's unlikely to be considered acceptable for inclusion. There are other, more widely used constructed languages that do not meet our criteria for inclusion. However, you could create an Appendix listing vocabulary, as we have done for Appendix:Quenya. --EncycloPetey 01:59, 12 January 2008 (UTC)
What!? That's kinda silly, don't you think? I can make a word-list in an appendix, but I can't simply add info in other places? That seems to me kinda like having a special wall just for graffiti on the back side of the building, but all the other pictures are murals because they appeal to more people...whatever, no big deal. LFN isn't my language so I'm not really worried about it...I was just curious and hoping to add it. --Sano 03:48, 12 January 2008 (UTC)
It's not silly - if it were not for the "minimum number of speakers" criterion, any of the zillion stupid conlangs would merit inclusion. It doesn't matter who created it or for what purpose; if it doesn't have ISO code, it usually means it's irrelevant. There are many other bastardized descendants ot Vulgar Latin you can contribute to though ^_^ --Ivan Štambuk 13:12, 12 January 2008 (UTC)
You Sir, are not invited to my house for arts and crafts day...I'm afraid you'll ruin all of the children's egos. --Sano 00:41, 14 January 2008 (UTC)

Attention to all: An Announcement

This was after I emailed the ISO 639-3 Registration Authority and asked what plans they had for LFN...and now we see. (*politely sticks tongue out at nay-sayers) --Sano 00:34, 17 January 2008 (UTC)
Please be advised that the existence of an ISO language code does not mean that a language qualifies for inclusion. ISO codes are merely a convenience here. We have words in a number of languages that have no ISO code (such as extinct languages or languages of aboriginal Australia), and there are languages with ISO codes that are specifically disallowed (such as Quenya and Klingon). Lingua Franca Nova will still fall into the latter category, even with an ISO code. --EncycloPetey 02:20, 17 January 2008 (UTC)

Since you're asking for opinions, my opinion is that it doesn't merit inclusion. Mike Dillon 04:09, 17 January 2008 (UTC)

So, I guess I'm regulated to making an index? In that case I can make an index for any language that I please? I ask because after looking at the criteria for inclusion very briefly, I don't see why not, unless the situation is simply that there is some sort of picky-ness going on here that I am missing... --Sano 15:18, 17 January 2008 (UTC)

Not an index, but an appendix. These are two different namespaces. But to answer your question, we have formally voted to allow Appendices like Appendix:Quenya and Appendix:Sindarin. We have not formally decided whether this extends to languages like Lingua Franca Nova, but I expect that it would. You could probably create Appendix:Lingua Franca Nova without opposition. --EncycloPetey 01:37, 18 January 2008 (UTC)
Right, finger vs organ, got it. So...when does the tribunal convene? And what's this about opposition? Is that like an insurgency? Do I need to reinforce my borders or something? --Sano 01:45, 18 January 2008 (UTC)
No, it's just that some artificial languages are more controversial than others. Klingon was discussed quite a bit the last time the issue was raised. And Brithenig has been actively discouraged. You'll notice in CFI that Brithenig is termed "not yet approved" for inclusion, but Lingua Franca Nova is described as having "no consensus", which means either no vote has happened or a vote happened but reached no conclusion. In this case, I think it's that no vote has happened. --EncycloPetey 01:54, 18 January 2008 (UTC)
Controversial? What!? It's not like I'm proposing pages upon pages of porn or abortion's a freakin' that is accepted, by a good many people, and one that I think merits inclusion just as much as Esperanto or Ido. Whose feathers must one flick to get some action on the approval of LFN for some sort of inclusion? Or is it one of those things where there will be endless academic discussion without any discernible results? --Sano 18:38, 18 January 2008 (UTC)
Esperanto is just barely included. I think the bar has to be set there because swathes of literature has been written in or translated into Esperanto; Esperanto has a cultural history; there have been geographical communities that chose Esperanto as an official language; and some descendents of generations of Esperantists could even claim to be native speakers. Not counting Esperanto would be almost tantamount to not counting Modern Hebrew, as it is a reconstructed language & some people have lived their whole lives using Modern Hebrew and not fully grasping any other languages. I don't think it's about flicking feathers. I think it's about getting hundreds of people to use a working version of LFN, teaching it to their children and grandchildren, posting original works in it, translating hundreds of books older than a century into it, lobbying governments to the point they seriously consider making it official, and waiting half a century for this work to become public knowledge. This is my opinion and I love conlangs/ auxlangs. If there were no appendix to add a conlang, I'd be up in arms, but I can see reason in the way we do things. In fact, for a decade I've wished books could be written with IPA superscripts and Han subscripts surrounding every sentence, so that non-native users can immediately both pronounce and comprehend such books, no matter what language or script it is written in and thus read it out to a native speaker. w:Furigana takes it part of the way, but the w:kana syllabary has a low number of phonemes and the w:okurigana make no sense to the Chinese, a huge language group. --Thecurran 08:57, 21 January 2008 (UTC)

Category:Colloquial anatomy

Could we have a category subordinate to Category:Anatomy for colloquial anatomical terms (just see the extensive list of synonyms for forefinger)? If need be a category (or two) for euphemisms or vulgar terms could be appended to that then. I make a suggestion in the header for what the name of such a category should be. Another would be Category:Colloquial anatomical names. __meco 02:37, 12 January 2008 (UTC)

I made {{anatomy slang}}. That might be a good start. Quite a fun project to. --Keene 00:11, 13 January 2008 (UTC)
I'll listen to other suggestions before expanding it. --Keene 00:16, 13 January 2008 (UTC)
Why is it terms now end up in three topical categories? Also the template itself should be categorized into the new category. Shouldn't the template code take care of all of that? __meco 11:59, 13 January 2008 (UTC)
You can use {{subst:new label}} to start a context template that conforms to the usual norms -- or at least to what those norms were a few months ago; nobody seems to bother writing anything down around here, so probably everything has changed since then. However, for a context template to work optimally we need to figure out what the optimal category name would be ... "Anatomical slang" strikes me as ambiguous: slang about anatomy or slang used among anatomists? (both exist...) I would suggest that the category be Category:Slang terms for body parts and that the visible label simply read "(slang)," since these terms (for the most part) would not be used by anatomists. -- Visviva 13:42, 13 January 2008 (UTC)
Not all colloquialisms are really slang. And, I don't think all anatomical colloquialisms are terms for body parts, though this is a bit debatable; I'm thinking of terms like and and . —RuakhTALK 14:38, 13 January 2008 (UTC)
I think that colloquial rather than slang is the most apt, like the term lickpot for the forefinger or index finger (those two should fit in the anatomy proper category). __meco 18:16, 13 January 2008 (UTC)

Upcoming vote on attestation criteria

Please make any last comments on Wiktionary:Votes/pl-2007-12/Attestation criteria before the vote starts. Of course it can be delayed if there are any major problems. DAVilla 15:29, 12 January 2008 (UTC)

I'll need to give the current wording careful though, but one item does stand out, and that is "classical work". Classical has too many possible meanings to be a good choice for this principle, including one sense that would limit its application to translations of ancient Greek and Latin literature. We need to find a better way to say this. I can't think of anything useful for this at the moment. --EncycloPetey 17:45, 12 January 2008 (UTC)
If you think of anything, feel free to edit directly and just delay the vote by a day or two. However, I wouldn't say it has to be any more well defined than "clearly widespread". DAVilla 00:32, 22 January 2008 (UTC)

I've made major revisions to clarify, but then probably went too far again. Feedback on the idea of age of works would be greatly appreciated. DAVilla 17:34, 24 January 2008 (UTC)

Comparative forms

The current model I follow for comparative forms of Hungarian adjectives is a simple statement "Comparative form of xyz" without an English translation. The text is generated by a template. I'd like to add the translation but not sure where. Please look at magasabb (taller), the comparative form of magas (tall). It would be more meaningful to see "taller" instead of "Comparative form of magas" in the definition line. The explanation could go either under Etymology or Usage notes. Another example is olcsóbb, where the translation is added before the template but looks strange because the template starts Comparative with a capital letter. Do we really want to add a separate entry for all these inflected forms for adjectives? --Panda10 20:55, 12 January 2008 (UTC)

The "goal" of wiktionary is to have an entry for every word in every language (or something) so yeah, we want all those forms. You don't have to add them right now, though. They're secondary to base forms.
If I want to put a definition along with a form-of word, I put it after the form-of information. See the very minor edit I put into magasabb. — [ ric ] opiaterein — 22:01, 12 January 2008 (UTC)
I'd like that with some minor changes in the template to make it look like other similar templates:
  1. (comparative form of magas) taller
This would require the following changes in the template: comparative with small c, no period after magas, italics, parentheses. Is this feasible? This template is used in other FL entries, too. --Panda10 23:12, 12 January 2008 (UTC)
FYI, this whole topic is a matter of ongoing debate. —RuakhTALK 04:14, 13 January 2008 (UTC)
Yeah... anything "form-of" related is touchy touchy stuff. — [ ric ] opiaterein — 16:11, 13 January 2008 (UTC)
There is good reason not to do it this way, at least in general. I think it was EP who had commented on it best, but I'm not sure where. Not everything form-of related is touchy though. Some things are just shaky. ;-) DAVilla 18:03, 21 January 2008 (UTC)

Wiktionary:Beer parlour, a free definition from Wiktionary

Seriously? Lol... Surely we can do something better than "a free definition", especially considering that there are a lot of words with more than "a" definition in one language. — [ ric ] opiaterein — 21:58, 12 January 2008 (UTC)

Bad (very bad) attempt to add keywords to Google. Gone again. (Google would have ignored it the very moment they noticed it anyway. (they do any number of things like that) Robert Ullmann 22:41, 12 January 2008 (UTC)
Move back to WT:GP. --Connel MacKenzie 01:20, 13 January 2008 (UTC)


At User:Connel MacKenzie/timezones#Results I have the timezones "recognized" by Java on toolserver. (YMMV.) How can these be worked into Wiktionary? Obviously we have to skip all the ones with "/" in the entry title, but what of the rest? Appendix or something? --Connel MacKenzie 04:42, 13 January 2008 (UTC)

The ids that are used by the code for getAvailableIDs() are not really the ones that a person would use to refer to the time zone. I've taken the liberty of re-running similar code and regenerating the list. Feel free to revert it or move it to another page if you like. Mike Dillon 05:37, 13 January 2008 (UTC)
P.S. I have locally adapted the Groovy code I posted to run this for multiple Locales. Mike Dillon 05:54, 13 January 2008 (UTC)
Here's the output for all locales that my JDK supports: User:Mike Dillon/timezones. I've made it only print unique names and dropped the timezone ids since they aren't what people call the time zone. Some of the countries used in the locales are arbitrary (e.g. "es_PE" for Spanish), but the list is unique and I don't think they're necessarily country-specific (except zh_TW v. zh_CN). Mike Dillon 06:31, 13 January 2008 (UTC)
The problem I ran into, was setting up a new "personality" for the bot - asking for the desired timezones in long format. I've no idea where to find the official list of composite names (that include the slash, as is required in that context.) I'm not sure what RFC might cover them...the ones I saw deferred the issue. Anyhow, FANTASTIC STUFF MIKE! --Connel MacKenzie 06:48, 13 January 2008 (UTC)
Mike, your outputs make the problem slightly worse, as it is even more unclear which of those meet CFI. Does "inclusion in a global protocol" merit inclusion for a place name? Or rather, shouldn't it? --Connel MacKenzie 06:53, 13 January 2008 (UTC)
I believe those timezone ids are the ones from the Olson database. I'm not sure they're covered by an RFC, but the Olson DB seems to be pretty widely accepted as a (descriptive) standard for the use of time zones on computers. I'm not sure where the JDK got the names from. Since there isn't a standards body defining these things, I think lists like this one can only be used as a starting point for where to look for citations; I can't see including them wholesale. Mike Dillon 06:58, 13 January 2008 (UTC)
Excellent information - thank you. Since that is public domain I don't see any reason not to include them all. Comments from others are appreciated... --Connel MacKenzie 08:12, 13 January 2008 (UTC)
I think that, where possible, we should try and include entities from standards, but I suppose it depends under what licenses the information is obtainable - there would be no reason (as with Airport codes above, and Chemical symbols) not to include an entry - even if there was nothing more to say about it than link to the 'pedia article. I am all for including everything! Conrad.Irwin 13:45, 13 January 2008 (UTC)

Slashes in page names

Connel mentioned in #Timezones above that slashes are not allowed in page names. After the creation of the Citations namespace, are there any more conventional uses of subpages in the main namespace? If there aren't we could get the developers to turn off subpages in the main namespace to allow slashes in page names were appropriate (it's a per-namespace setting in MediaWiki). I can't say one way or another whether there are any otherwise acceptable entries with slashes in their names, but I don't see any reason to keep the restriction if we aren't actually using subpages. Mike Dillon 06:24, 13 January 2008 (UTC)

Well, I think subpages are turned off for NS:0. But "/experiment" and "/citations" pages exist for many NS:0 pages, that have to be programmatically excluded from various things. The only request I've seen in the past (from Hippietrail) was to have subpages turned on for NS:0. --Connel MacKenzie 06:51, 13 January 2008 (UTC)
I don't think they are turned on, see lead/experiment - there is no link at the top back up again. Compare with User:Conrad.Irwin/anger which does have the link back up. Conrad.Irwin 13:48, 13 January 2008 (UTC)
Well, we have n/a and and/or. —RuakhTALK 14:31, 13 January 2008 (UTC)
... and , , , and . Rod (A. Smith) 18:02, 13 January 2008 (UTC)
There have been a few cases where etymologies or word histories have been set up as subpages the way citations used to be. As far as I know, this never caught on widely, and there are only a few instances of such a thing. However, I do not recall which words had such pages established. --EncycloPetey 01:16, 14 January 2008 (UTC)
Category:Citations should be a reasonable starting point. Conrad.Irwin 01:02, 15 January 2008 (UTC)
User:Conrad.Irwin/Citations and User:Conrad.Irwin/Not citations gives the full list of pages with forward slash in the title as of last XML dump. It would appear that the Citations pages will probably need the attention of an automaton - which given time I would like to have a go at - while the other list contains several pages that should probably be deleted. Conrad.Irwin 02:02, 16 January 2008 (UTC)
In the browser URL bar & many other computing situations, %20 = " ", %21 = "!", %22 = """, %23 = "#", %24 = "$", %25 = "%", %26 = "&", %27 = "'", %28 = "(", %29 = ")", %2A = "*", %2B = "+", %2C = ",", %2D = "-", %2E = ".", & %2F = "/". We already re-interpret " " -> %20 as "_". Isn't there some way to name pages with %2F after the "" instead of the subdirectory-denoting "/" to help our search engines out? --Thecurran 08:15, 21 January 2008 (UTC)
Long-term, can we get them to turn off all special characters? That would be so much more elegant. DAVilla 17:57, 21 January 2008 (UTC)
In particular the colon? That'd be great for a lot of Swedish abbreviations... ;) (Yes, I understand it ain't likely to happen). \Mike 18:13, 4 February 2008 (UTC)
Yes, including the colon. You would probably have to escape it to make the link work, e.g. [[1&colon;1]] (one to one), but there isn't any reason that the namespace can't be treated as a completely separate field. DAVilla 00:53, 10 February 2008 (UTC)

Translingual ISO & IUPAC Symbols

I don't feel comfortable spacedocking from CH#translingual's example of Switzerland, DE's, ER's by user:SemperBlotto, or GR to post the w:ISO 3166-1 alpha-2 country code for each of the 192 independent UN member states (as used in ccTLDs) with a complex sentence blurb, because it would tread on many more toes than simply following the exhaustive example from the postal abbreviations of each US state within the English sections, despite their international use in w:ISO 3166-2. I feel the same way about the w:ISO 3166-1 alpha-3 codes that will take prominence in the upcoming Olympics. I do, however, wish to applaud H#translingual's lead on the symbol for Hydrogen as well as the rest of the chemical elements. I'd like to update them with translations to their official Chinese symbol counterparts, though. --Thecurran 04:41, 14 January 2008 (UTC)

Hm, the symbols of the chemical elements really are translingual, in that they are also used in languages without Latin scripts: hence Cu is the symbol for copper (from Latin cuprum), whereas the character for copper is (铜 in simplified characters, tóng in Mandarin, dō (どう) in Japanese, dong (동) in Korean). Just a thought! Physchim62 12:31, 14 January 2008 (UTC)
Your point is well-taken. It's just that I feel that people have put a lot of work into making single gylph chemical symbols for Simplified Chinese, which is an official UN script, that the one-to-one correspondence is elegant, and that many of these Han symbols are used by a quarter or more of the world's population, ranging widely geographically and truly trans-lingually, though not universally. If this argument is not strong enough, I'm happy to forgo this project.:)--Thecurran 13:42, 14 January 2008 (UTC)
No forgoing! I'm sure there's a policy against that :). The main issue is that "Translingual" is not a black and white property. The symbols are translingual in the sense that they are used in more than one language, but they are not translingual in the sense that they are used in all languages. What would be better would be to have a more specific heading, though how to do that would stretch the Wiktionary discussion rooms to breaking point. Though I see no immediate problem with them being put under English or Translingual, they are both nearly correct, we will probably have to come to a decision at some point. (Food for thought) A scale of increasing translinguality: Place names; Given names; Brand names; Internet slang; Country codes & Language codes; Chemical symbols; Currency codes; Airport codes. </original research>. Conrad.Irwin 18:29, 14 January 2008 (UTC)

Request of permission

  • User: FiloSottile (it) [3]
  • Name: --BotSottile 17:11, 14 January 2008 (UTC)
  • Software: pywikipedia
  • Tasks: interwiki
See note on talk page; we have a much more efficient bot User:Interwicket that updates all of the iwikis here. We will not approve another iwiki bot unless it is specifically doing only new pages on wikts other than the ones already covered by VolkovBot.
the wikipedia-style interwiki process is not really appropriate to the wikts, and even many months of running will leave many iwikis missing, particularly for smaller wikts. Interwicket's last run added ~45K new iwikis to 30K entries, doing all of them in one pass. Robert Ullmann 17:17, 14 January 2008 (UTC)
No thank you. --Connel MacKenzie 18:43, 14 January 2008 (UTC)

Translations of "Translingual" words

Translations are currently allowed only under English entries. I think they should be allowed under Translingual entries as well, because in most cases translinguality is actually limited to a small sample of the 216 languages of Wiktionary. As an example, there is currently no place where one could enter a Japanese (or any other) translation to the word Eukaryota. This is not only a question of transliteration, since Eukaryota is not a valid word in many languages that are written with Latin characters either. As an example, in Finnish the word eukaryootit is used in scientific context, but the high school biology books talk about aitotumaiset, which is not a colloquial term, but the recommended standard Finnish word which is completely acceptable in scientific text as well. I could easily list hundreds of "Translingual" terms which have the same or a related problem. Furthermore, among languages using Latin characters, the problem is not limited in Finnish. It applies to Swedish, German and probably to a large number of other languages. One could even argue that most Translingual entries are actually English - which argument leads to a simple solution: if the header "Translingual" were changed to "English" in those articles where translations exist, one could "legally" enter them. Hekaheka 15:56, 17 January 2008 (UTC)

Not knowing any Finnish, perhaps I shouldn't doubt you, but are you sure Eukaryota is not used? Compare English, where Eukaryota is the "scientific" name, but eukaryote is the term used everywhere, including, I suspect, in a scientific context. Eukaryota, when used in English, is recognized as foreign (people informally call it "the Latin name"). Is it not used in that way in Finnish? I thought taxa were truly translingual.—msh210 16:28, 17 January 2008 (UTC)
The taxa are also only kind of translingual. They are acceptable, but not likely to be widely understood. For example, it would not exactly be an error to use the term Asteraceae or Compositae in a Finnish or Swedish non-scientific text, but it would probably be understood only by botanists. Instead, mykerökukkaiset or korgblommiga växter would be understood by almost everyone. These terms have an exact meaning in Finnish and Swedish respectively. Hekaheka 10:11, 18 January 2008 (UTC)
But to answer your main point about including translations of translingual words into languages that don't carry those words, yes, I think we should allow it, under a Translations header. We should also list languages that do use the translingual word, either under a new header designed for that purpose, or (perhaps) under Usage notes. But for termsthat are the same in many languages (such as ), even though they don't exist in all languages, I support the Translingual header as opposed to listing every language individually (which seems to be what you're suggesting in your last sentence, Hekaheka).—msh210 18:28, 17 January 2008 (UTC)
In many cases the word will be listed in individual languages anyways, particularly if it has additional meanings that are not translingual. I'm not sure how often that would be true, but it will probably come up commonly for initialisms like airport names and borrowed words like taxi. DAVilla 11:19, 25 January 2008 (UTC)
We should certainly permit translations for Translingual terms, if English is among the languages in which the term is used. Few if any Translingual terms are truly universal; they just happen to be shared among a large number of languages. -- Visviva 17:24, 17 January 2008 (UTC)
agree. Very much unlike translingual symbols like, say the male/female symbols, or . Circeus 20:32, 17 January 2008 (UTC)
Even symbols can be iffy; there's a Hebrew alternative plus sign (), for example (though in my experience the ordinary plus sign + is more common in Israel). —RuakhTALK 01:08, 18 January 2008 (UTC)
Wouldn't a seealso/related header do the trick? Circeus 21:39, 18 January 2008 (UTC)
I take issue when there is no difference in the glyph, only in the code point, e.g. for variations of capital A. Anything that looks identical should be listed on the same page, and the other pages when necessary can redirect. DAVilla 11:25, 25 January 2008 (UTC)
Since this would be a major change in the way we do things, I'd want to see a couple of examples of the proposed formatting, and probably a vote. I'm against the idea, though I do understand the issue and the need, and would be willing to change my mind if a suitable format existed to address potential issues. --EncycloPetey 01:39, 18 January 2008 (UTC)
Tossed together one possibility at User:Visviva/Canis. Not sure about placement of the notes; I think that in general the Usage notes section, wherever it is, should include pronunciation and inflection information for each language where this is available. I'm not sure why this would need a vote (but then again, I don't really understand why anything needs a vote). -- Visviva 16:29, 18 January 2008 (UTC)
User:Visviva/Canis is good.—msh210 18:41, 21 January 2008 (UTC)
So, my question then is: Does Hebrew use כלב as the name of the taxonomic genus, or is it simply the word for "dog"? This same question could be applied to any of the other translations in the table. Is Hebrew כלב a proper noun used by Hebrew biologists to refer to the taxon, or just the common noun referring to the animal? --EncycloPetey 05:09, 24 January 2008 (UTC)
It's the name of the genus, and also the ordinary word for dog. w:he:כלב ([[w:he:Dog]]) is about the domestic dog; in the taxobox's "scientific classification" section, it lists the genus as כלב, which is a link to w:he:כלב (סוג) ([[w:he:Canis (genus)]]). My impression is that the Latin-style names are used as glosses in order to coordinate with the rest of the world, but that in normal text (including by biologists), the Hebrew names are used. (But, I'm by no means an expert on this subject. If you want, I can ask at the Hebrew Wiktionary and report back.) —RuakhTALK 06:16, 24 January 2008 (UTC)
The CJK translations are all basically [dog] + [taxonomic genus], and are used in taxonomic literature and elsewhere (including field guides and encyclopedias such as the relevant Wikipedias). Similar compounds exist for most taxa. The status of species names is perhaps a separate debate. -- Visviva 07:30, 24 January 2008 (UTC)
I'd like to know what classifies as translingual. I believe it's reasonable to call the a current taxonomic name translingual. I mean, I bought a snow monkey photo booklet <10$ in a Japanese park, very much written for Japanese layfolk and even it uses the same Latin script nomenclature in a few notes in the back. Besides, most English-speaking layfolk I know couldn't describe a Eukaryote or what falls under Eukaryota. That's why the pseudo-names in roadrunner cartoons are so funny. I still think it would help to have translingual entry like a species' name link through to its common name in other languages. Under the current system, it would seem best to link the species name to an English common name, and bear translations from there. Whether you put it on the same page or list it as a See Also or Synonym doesn't concern me. I further think that Hebrew plus signs as well as numerals from sytems beside the Hindu-Arabic, etc. are important and may be used more frequently in many situations, but in inter-ethnic communication, these "other" characters are superceded. As such, I assert the international Taxonomic nomenclature and Scientific and Mathematical Symbols are truly translingual and we just need to keep up with ISO, IUPAC, IUPAP, ICZN, ICBN, etc. Special things like ♂ should be noted as meaning male almost everywhere but it competes with 火 to mean Mars. :) babl --Thecurran 01:39, 19 January 2008 (UTC)
There is a proposal in this upcoming vote that "Translingual terms must be verifiable or approved in each of at least three languages that are not closely related, or dictated by a recognized body of international standards." DAVilla 11:19, 25 January 2008 (UTC)

Commons pictues

I know we use Commons pictures frequently here, but wonder whether there's some specific subset of Commons pictures we're allowed to use, or whether, on the contrary, any Commons picture at all may be used. (I'm confused because my understanding is that Wiktionary aims to be GFDL, whereas many Commons pictures are under other, incompatible, licenses, such as cc-by-sa.)—msh210 18:15, 17 January 2008 (UTC)

My understanding is that we use whatever commons: has, deferring all media-licensing issues to them. --Connel MacKenzie 19:42, 17 January 2008 (UTC)
Thanks!—msh210 19:46, 17 January 2008 (UTC)
Cc-by-sa licenses have heretofore been considered GFDL-compatible, both on Commons and also on EN Wikipedia et al. Cc-by-sa-nc would be another matter, but fortunately those aren't allowed on Commons anyway. -- Visviva 15:40, 18 January 2008 (UTC)

Orthography & Spelling

hello everyone,
in the Tee Room of the German wiktionary a controversial discussion is going on about how to deal with having different types of english spelling rules in different anglophone countries (UK, USA, Australia, etc.) and how this will effect english articles in de.wikt.
one of us came up with the idea of avoiding duplicated articles that just differ in spelling by introducing a template like this {{american spelling|[[word]]}} which should indicate that this particular spelling of a word is only formally allowed in the US. another one of us responded by stressing that this template could cause more irritation, because e.g. in Australia both spellings (british and american) are formally allowed; in Canada no standardised spelling regulation exists. so, for instance, when you create an article for an english verb ending with -ise / -ize you could choose one spelling as the main spelling - for instance the british one. if the verb has the same meaning in all english speaking countries you could create another article (with the second spelling) that now contains only the template indicating that the verb with this particular spelling is only used in the US but has the same meaning like the verb spelled british. but then you recognise both spellings being correct in Australia. so you should create another template. one for the australian spelling which should be added in both articles. and this could go on and on!!
so, the point is how are you dealing with articles that have exactly the same content/meaning and just differ in spelling? could someone of you give us a hint to solve this problem? which one is the main spelling in en.wikt? is there any at all? how do you deal with e.g. -or / -our or with -ise / -ize? do you simply duplicate such articles? -- thank You in advance, cheers Caligari, 22:55, 17 January 2008 (UTC)

While that would probably be a good idea, we can't seem to decide which spellings should be the main ones. — [ ric ] opiaterein — 01:40, 18 January 2008 (UTC)
We don't have just one way to deal with these problems. We have several approaches in use, and have not agreed yet on any single solution. --EncycloPetey 01:44, 18 January 2008 (UTC)
Hello EncycloPetey,
can You show me what kind of approaches you have in use? and can you show me where to find the main discussions here about this spelling issue? - cheers, Caligari, 15:36, 18 January 2008 (UTC)
Part of the issue is also that, with increasing internationalisation, spelling (not to mention usage) has come to be more fluctuant in the UK (and in some case, the U.S.) than it used to be (much to the chagrin of the linguistic conservatives, of course). And that's ignoring the issue of Canadian (somewhere halfway) and other variant spellings (I know nothing myself of usual Indian, South African or Australian spelling). Circeus 21:42, 18 January 2008 (UTC)
In many cases, I think we double the pages, as dictionary entries are short, unlike in Wikipedia where a re-direct would be used to whichever form was used when the page was first written. Generally last century, most English-speakers would understand Canadians most easily, having read British English but heard American English, thanks to the body of British Literature and American media. With the rise of the Internet and multi-nationals, however, American spelling is on the rise, even in many English schools for foreign speakers.
Personally in international contexts, I try to use -ize (US) to separate from "advertisement" (universal) and -our (UK) to separate from "error" (universal), because such splits demonstrate more etymological history and being of neither type is neutral and kind of acceptable to other Commonwealth speakers. I watch Deutsche Welle news in English and they use American accents and seem to avoid words that cause such controversy by using other synonyms.
If it becomes a real problem, just take a vote and settle on the Oxford English Dictionary (UK) or Webster's (US). In Australia, most modern UK terminology never took root, thanks to US market dominance; "truck" is used, instead of "lorry". Unless, the EU decides quite soon to firmly establish British English, I imagine such market forces will see American spelling and pronunciation overshadow British, despite geographical considerations. Considering that the UK is yet to adopt euros and other signs of wanting to stay different from continental Europe like left-side driving, I just can't see that happening.
In five years, I imagine the argument will be settled in favour of the US style, so if you had to vote, I would go with Webster's. Once again, I personally wouldn't choose either. :) --Thecurran 10:07, 19 January 2008 (UTC)
Thank You all for commenting on this issue. I just wrote in the german Tee Room that You have no clear position on spelling either and that You just duplicate complete articles that differ in spelling. I also mentioned there that You suggest first to vote for a main spelling and settle on the Oxford English Dictionary or on Webster's. - Thank You again, best regards, Caligari 13:31, 19 January 2008 (UTC)
Er, I may be wrong, but I don't think those statements are quite accurate. We aim to cover all forms of all words in all languages, so of course both spellings should be included. Which one is treated as primary, and is favored with translations et al., is of course subject to debate; however, voting on such an issue would be highly counterproductive. Our typical de facto approach is seen in entry pairs such as realize and realise. Duplicating entries by hand, although superficially attractive, creates serious maintainability issues. One attempted remedy is Template:color-colour (noun), although that has never gained widespread acceptance. -- Visviva 10:11, 21 January 2008 (UTC)
The first part, though simplistic, sounds like a pretty good summary, although in truth they're thus far only duplicated when contentious, as too many words simply avoid our attention. We've had a number of hairy battles over the issue, and that's the best way to solve it. Technical solutions that transclude common sections are too complex.
(I should point out that, outside of the US/UK conflict, many times we do have alternative spellings that are not fully duplicated, which is not generally seen as a problem. Both pages exist, but one is essentially a stub, labeled as an alternative spelling of the other.)
The second part was just one person's suggestion. Personally I don't think you should vote between OED and Websters for primary spelling. If it follows either rule then it is a primary spelling in some part of the world. There's also the problem that they may not have exactly the same meaning, eg. program vs. programme. DAVilla 17:47, 21 January 2008 (UTC)
Sweet! That sounds like a great template. Say, if you could show me the error of my ways on the inaccurate statements/ assumptions I greenhorned in, both my talk page and I would be all ears. --Thecurran 16:07, 21 January 2008 (UTC)
Just a comment on "Webster's", which someone recommended as represnting standard U.S. spelling. There's no one dictionary known as "Webster's": there are many. See [4] for more on that, and for three recommendations for good American-English dictionaries.—msh210 18:04, 21 January 2008 (UTC)

Mycenaean Greek

Anyone interested in Mycenaean Greek should check out a new discussion at Wiktionary talk:About Ancient Greek#Mycenaean.......Greek? Redux. All others, please disregard. Thanks. Atelaes 09:12, 20 January 2008 (UTC)


I have created a new competition, open to everybody. It is similar to one I've used at work. The winning entry there was about 2000 words though, but I'm not expecting anything that gargantuan. Enjoy, here. --Keene 13:24, 21 January 2008 (UTC)

IPA w:Alveolar trill vs. w:Alveolar approximant

I have noticed in Wiktionary (too many examples to name, but you can probably find one) that many times when the alveolar approximant appears in a word, it is represented by [r] (instead of the correct [ɹ]), which is actually the symbol for the w:Alveolar trill. I can understand how IPA can be difficult (I didn't have that problem, but some might). It has many symbols unfamiliar to people using non-Latin writing systems. However to use incorrect IPA is detestable to me. It confuses people who do and don't know IPA alike, (since the same symbol represents the trill), as to whether to use the trill or the approximant. This is not confusing to native English speakers (except with a new word or place name), but for non-English speakers learning English whose native language has the trill and the approximant (e.g.: Spanish), it is very hard to tell whether it's the trill or the approximant. On IRC, BadTypoDog recommended w:ASCII-IPA since it is easier, and some people don't have the fonts that display IPA correctly. And don't even get me started on names, which I have a hard enough time pronuncing just because of this factor.

Something needs to be done about this. --Ionas 19:26, 22 January 2008 (UTC)

I know we've discussed this to death over the past, oh, three or four years at least, but I wonder if it's ever been put to a vote, because quite frankly I agree with you. DAVilla 20:06, 22 January 2008 (UTC)
Now there is. I gave Wiktionary:Votes/2008-01/IPA for English r 45 days, seeing as so many people are opinionated on the subject. DAVilla 03:40, 23 January 2008 (UTC)
I don't think that dragging the vote out for a month and a half will reap any additional benefit. Most of us have seen both the pro and con arguments many times. What would help most (IMHO) is a redaction page of the pro and con arguments, then a relatively quick vote. I would guess that most people willing to change their opinion have done so already (one way or the other), but I have no sense what the outcome on the matter would be if we went to a vote. --EncycloPetey 04:02, 23 January 2008 (UTC)
Maybe 10 days would be enough? While most of us will vote the first week, many times I have seen people bring this up who are not one of "us". I'd rather start the vote early and leave it open long, than drag out the start date, since either way should conclude at around the same time. You're welcome to populate an arguments section in the meantime. DAVilla 04:21, 23 January 2008 (UTC)
It's worth repeating that there's nothing wrong with using /r/ to represent the phoneme when we transcribe English words broadly (as a sequence of phonemes), and that there are benefits to transcribing English words broadly, e.g. to show dialect-neutral pronunciations. Rod (A. Smith) 21:35, 22 January 2008 (UTC)
I myself see that phoneme/sound distinction unnecessary; that is why it's called the International Phonetic Alphabet, not the English Phonetic Alphabet. We shouldn't use the phonemes in entries unless we can properly transcribe them (that is, /ɹ/ instead of /r/). We need to make it clear that [r] is not in our language.
You talk about a "set of phonemes". The sound [r] is not in the basic English phonemic set, so we should not use it (which makes it look like [r] is in English phonemic set). Just because [r] is not in basic English (what about names, Roderick) doesn't justify beïng too lazy to find a symbol and click a button. --Ionas 23:08, 22 January 2008 (UTC)
As you say, no major dialect of English has the sound [r]. So, phonemic transcription using IPA allows /r/ to indicate the phoneme that most speakers pronounce as [ɹ]. It's important to understand the difference between the two different types of transcription. Rod (A. Smith) 23:21, 22 January 2008 (UTC)
From a reasonably inexperienced viewpoint it does seem that Ionas has a point, if /ɹ/ is the symbol for the international phoneme, and Wiktionary tries to be an international dictionary, we should use /ɹ/. The /r/ character, as it seems reasonably flexible, would be fine if we only included English words - but it is confusing to me, and no doubt others, if we are using the same symbol to mean different sounds on different parts of Wiktionary. Conrad.Irwin 23:40, 22 January 2008 (UTC)
Consider the first phoneme in . Most English dialects pronounce it as [ɹ], but some pronounce it as [ɾ]. So, when showing the word's phonemes, we should choose the more generic symbol, i.e. /r/. Only when transcribing phonetically should we use the symbol [ɹ], and then only when we qualify it with a specific dialect. Rod (A. Smith) 23:46, 22 January 2008 (UTC)
That's a good argument insofar as we transcribe a range of pronunciations; Scottish English, for example, shows person-to-person variation between [r], [ɾ] and [ɹ], and when we bother to transcribe it, /r/ is probably sensible. However, we don't tend to transcribe pandialectally at all - we don't write /bɛr/ for 'bear' and apply it to all dialects, GenAm, RP, whatever, although we could, and just put the differences down to subphonemic rules. Instead, we say and hopefully we'll add some more. You get the idea. So, while "[r] is not in the phonemic set of English dialects therefore we don't use it to transcribe the rhotic phoneme" is not a compelling argument, giving as the phoneme the default phone of the dialect or standard we're transcribing is a good idea. I mean, in many ways using /r/ is more transparent, since, a) native English speakers can pronounce the /r/ as they wish, they just need to know it has their /r/ in it, and b) it's unreasonable to expect a non-native speaker can, almost automagically, start pronouncing the sound as [ɹ] rather than as their native sound just by reading a transcription, and c) that they'll think English has more than one rhotic phoneme per dialect because we, er, use a different one to transcribe than the one they know is pronounced? I didn't really follow that argument. Anyway, it is, however, not sensible to transcribe the phoneme as /r/ for a dialect that never uses that phone when we have, on the one hand, English dialects that do (Scottish English, Irish English, Wenglish, Scouse - although, all of these can also have [ɾ] instead) and, on the other, different languages that use that phoneme and perhaps several others (incidentally, and correct me if I'm wrong, but Spanish doesn't have an alveolar approximant, it has a trill and a tap/flap). We only need to show the minimal distinction of a language, but the phoneme should, ideally, reflect the default ('elsewhere') phone, especially when we've built ourselves such a handy input method and Unicode is nigh-ubiquitous. And, of course, when we have SAMPA (and I'd be happy to transcribe in Kirshenbaum too if we decide it's a good idea) to fall back on. --Wytukaze 00:08, 23 January 2008 (UTC)
There is no such thing as an international or pan-lingual phoneme; the distinctions between phonemes are arbitrary and unique to each language. -- Visviva 09:48, 23 January 2008 (UTC)

Your "GA has /bɛɚ/, RP has /bɛə/", etc. is what I like best, instead of using the totally wrong IPA. And I have to learn Kirshenbaum, I just don't think SAMPA is something I like. --Ionas 00:30, 23 January 2008 (UTC)

Well, just to be nitpicky, pernickety, pedantic, whatever, it's not, really, the "wrong" IPA. It's perfectly valid. We could transcribe all English rhotics as /ɧ/ or /ɞ/ or /zæŋflæks/ if we wanted, it's just that keeping it as close to the phonetic realisation as possible seems best to me, for our specific situation. And aye, a lot of people prefer Kirshenbaum (I don't care either way) but it seems to've fallen out of use in favour of X-SAMPA these days. --Wytukaze 00:47, 23 January 2008 (UTC)

This again (/r/)

It is pathetic to see this issue resurfacing again. The preposterous notion that anyone outside of linguistics can read IPA notation is so absurd, I'm not sure where to begin.

There have been numerous takes on what IPA is versus what it should be. The basic reality, is that each flavor of IPA is flexible enough to accommodate its target audience. That's why the few abortive dictionaries in the US that have tried using IPA have pretty consistently used /r/. IIRC, someone mentioned the same for dictionaries in AU, right?

The use of IPA at all is POV. It is no less ambiguous than any other pronunciation scheme, it just has some severe proponents in this context.

While some language Wiktionaries have taken the reasonable step of allowing alternate views for pronunciation schemes, the same POV-pushers here have resisted any such efforts. Where's the code, folks? I see a plethora of sysops around, yet only five or six make any effort at all to even try to find technical solutions? Defaulting a view in IPA is just absurd.

Take your upside-down and backwards /r/ and begone. Come up with a real solution that accommodates all regional flavors first, then start bitching about how one particular character appears in one single view, appropriate only for a small segment of Europeans. Until then, use characters that have some possibility of being understood. Hell, how about a character that even renders on a default browser, without loading rare, obscure fonts?

--Connel MacKenzie 04:49, 24 January 2008 (UTC)

Why do you care how we write IPA if you don't want us to use it at all? And here's the code: {{SAMPA}} and {{enPR}}. But you know about those. You also know about using sound files. So why the vitriol? Writing something in IPA doesn't make SAMPA, or enPR, or using a sound file impossible. If you don't like writing (US) /ɹɛd/, then don't. You can write (US) Template:X-SAMPA, or (US) enPR: rĕd, or
. Cynewulf 09:59, 24 January 2008 (UTC)
That is a fallacious argument. By setting the pronunciation first (>99% only) to use IPA you (you plural) have set IPA as the default, despite it being inappropriate. The code of which I refer, is transformation code that might render any supplied pronunciation (enPR, SAMPA or a flavor of IPA) into something a typical reader can read. We've had several iterations that I know of, with POV pushers removing anything not in their IPA dialect. The wrong /r/ has been used as justification for such vandalism in the past. Apparently, the mindset of "RP IPA or nothing" coincides directly with vandalizing /r/ to be upside-down and backwards. That, despite upside-down and backwards /r/ being inappropriate for 99% of our readers. --Connel MacKenzie 16:14, 24 January 2008 (UTC)
I was arguing for use of phonemic transcriptions with /r/ and phonetic transcriptions with [ɹ], but Connel's hyperbole and ridiculous cries of vandalism drown out the signal and my interest. Rod (A. Smith) 16:34, 24 January 2008 (UTC)
Connel, more people prefer IPA because it is international in a way that the phonological-transcription-formerly-known-as-AHD isn't. IPA is in use now on most wikipedias, on most wiktionaries, and in many major mainstream dictionaries (e.g. OED and Cambridge). SAMPA is a dumbed-down version rendered in ASCII and developed in the days when ethe internet was limited to ASCII. Most other systems are parochial. --EncycloPetey 05:51, 26 January 2008 (UTC)
Personally, I pretty much only ever use RP IPA. This may be seen as POV, but only because I'm only familiar with the IPA alphabet and am most familiar with that way of pronouncing (i.e. the regional accent closest to my own). --Keene 17:17, 24 January 2008 (UTC)
Do you claim, or is it true, that this character is the only IPA character we use that is not widely rendered? Because otherwise your argument sounds like one for scrapping IPA altogether. DAVilla 21:01, 24 January 2008 (UTC)

Since we are a multilingual dictionary, in the case of a language with both sounds, how do you tell which is being used (the tril or the approximant) if there is only one symbol to represent them? I agree with Rodasmith. Connel, the only thing that is pathetic is your behavior as a user trusted by the community. Has anyone contacted the International Phonetic Association about this? Why do we have two symbols for the Voiced Uvular Fricative/Approximant and the uvular trill but it is somehow correct to use this? I myself do not use SAMPA and do not care as long as IPA is not abolished. Cynewulf makes a good point: why do you care if you don't want to use it? And I am afraid if you use the [r] incorrectly to mean an approximant, either invent a new symbol for the trill so I can tell the difference, or just use the proper IPA.

You, Connel, are the POV pusher and cannot speak for "the typical reader". Yes, the [ɹ] is upside-down. This discussion never was about whether people can read IPA, and I think a template should specify IPA, SAMPA, and Kirshenbaum if there isn't already. --Ionas Freeman (自人) 20:41, 24 January 2008 (UTC)

Yeah, yeah, this again, Connel. I'm sick of it too. And I'm sorry, until relatively recently, I was outside of linguistics but I'd been able to read IPA for years before. I learnt. And plenty more can manage, just by reading the pronunciation key on an IPA-using dictionary, which are apparently more common stateside, even, than you realise. Hell, we have Wikipedia, you can get a damn good grounding in it just from reading the IPA articles. But besides, that's irrelevant. We have the cross-regional transcription scheme, it's enPA (which, incidentally, is therefore more ambiguous than our IPA transcriptions as a necessity). We use IPA to closely render the phonemes of various, hopefully somewhat standardised, dialects, for those who are interested. Not interested? Ignore them. And I agree, we should have the option to view, say, IPA, SAMPA or Kirshenbaum, localised transcriptions or a generic one, or all of them, none of them (no pronunciation section at all) or something in between. Can't read any of those and too stubborn to learn? Well, that's why we're adding soundfiles. I don't see the problem.
Yes, using IPA is a POV thing. Not using it is, of course, the same. That's boring and obvious. We use IPA because it's the most commonly used and accepted and encoded scheme. And there's no better reason. There are Americanised variants of IPA, but these are essentially for field transcription purposes (/š/ for /ʃ/, for example, because the latter might be confused with /s/ in handwriting), and as such European linguists use these and others as well, when they're writing by hand - we don't need to use these variants, in lieu of the international standard, because we're not in a rush and we're not using pen and paper. And yes, there are yet more transcription standards, most of them badly supported and very rarely used. enPR is our compromise for a set of these, the most commonly used set after IPA, because no dictionary agrees on the specifics. Want more? Add them. We don't have Kirshenbaum at the moment, for example, because we use SAMPA for our ASCII IPA scheme (again, it's currently more popular) so some argue it's redundant and because there're objections to making the pronunciation sections much bigger than they already are. Shame, but there you go. A customisable view (default showing everything, of course) might go a way to alleviating this.
Now, everyone else ready to get back on topic? Let's call this a discussion about what we already use, that is our GenAm and RP systems, which have [ɹ] (or sometimes [ɻ], especially in GenAm) as their default rhotic phone (not phoneme, which is, evidently, debatable) and so can be satisfactorily rendered - so long as we explain in the case of /r/, perhaps - with either symbol. /r/ or /ɹ/, people? Which do we prefer, regardless of anything else? (And do we really need to put it to a vote? That's so.. bureaucratic.) --Wytukaze 00:09, 25 January 2008 (UTC)
Oh, one more thing. This does apply to SAMPA as well. We can't really go transcribing /ɹɛd/ for IPA and then /rEd/ for SAMPA - the SAMPA should basically be an ASCII representation of the IPA on that page. --Wytukaze 00:18, 25 January 2008 (UTC)
By the way, didn't French Wiktionary come up with a way to translate one into the other? I can't see most of the symbols, whether in edit or on the page, unless the script template is used, so I would really like to have something from SAMPA to IPA. But I think they did it the other way around. DAVilla 11:05, 25 January 2008 (UTC)
The french wiktionary have some javascript that can convert IPA to SAMPA, this is a lossy transformation afaict which would imply that the conversion can't be always done the other way. If we want this here we can get it, but it would require a lot of thought about exact implementation details. Conrad.Irwin 19:30, 25 January 2008 (UTC)
Good, let's start. And we can start with double checking. Are you sure about losing information, or can anyone else confirm? Remember we're talking about X-SAMPA here, which should be 1-to-1. DAVilla 11:21, 26 January 2008 (UTC)
I have imported their rules to create User:Conrad.Irwin/ipa2sampa.js, which does nothing interesting yet. It does translate both ǁ and ʖ into |\|\ and I think there are some similar cases, but I haven't yet learned to read this stuff well enough to know what each symbol means. It should however be reasonably easy for someone to change things 'Symbol to find':"New symbol to use", is the format that it uses. Conrad.Irwin 21:47, 27 January 2008 (UTC)
I missed the tail end of this conversation, but did create {{ipa sampa}} by way of experiment with a server-side solution. Uses the regular SAMPA conversion table from Wikipedia atm, minus fancy stuff like nasals and rhotics; but the same approach should be equally applicable to X-SAMPA with better results. Unfortunately I don't see an easy way to integrate this approach with current standard Pronunciation layout... Anyway, just tossing it out as a possibility. Revise, comment, delete, whatever seems appropriate. :-) -- Visviva 10:42, 30 January 2008 (UTC)

Please review/edit sample Hangul syllable entry

OK, I can see which way the wind is blowing. I don't much care for it, but if we're going to have all of these little pieces of non-lexical crap floating around in mainspace we may as well do it right. Feedback on User:Visviva/굫 sought. Cheers, -- Visviva 16:08, 23 January 2008 (UTC)

I'll do feedback here, I guess. All very good and useful, I'd say. One thing: I'd prefer "Hangul character" - it's a syllable in Korean (or would be), and it's composed of three other Hangul characters, sure, but you could use the character however you wanted. Sorta like how "shfeh" would be a Latin (alphabet) string, not a syllable, or a sequence of phonemes, or whatever. You get my point, I hope. Maybe more specifics would be necessary - 'Hangul composite character' or some such (Ligature? Not true to life, I suppose) - to differentiate it from ㄱ "Hangul character" or "letter" or whatever. --Wytukaze 01:48, 24 January 2008 (UTC)
Thanks. I went with ===Symbol=== for the POS header because ===Syllable=== is already in use for a different purpose (primarily for hanja readings), and chose "Hangul syllables" as the category because that's the name given to the codeblock in the Unicode (5.0) standard. Not sure it's optimal, though... something like "Hangul syllabic [blocks/characters/entities]" would be more descriptive.-- Visviva 02:34, 24 January 2008 (UTC)
I don't think it should say "a unicode hangul symbol", as that suggests that it does not exist outside of unicode (when, in fact, it is perfectly possible to write hangul in script). It is a unicode representation of a hangul symbol. Might also be nice to have just a handful of representational words using the character. bd2412 T 03:41, 24 January 2008 (UTC)
  • But don't get me wrong, I like what you're doing with it! bd2412 T 09:23, 24 January 2008 (UTC)
What is a "standard keyboard"? My IME gives 랴ㄱ when I type ryg, and 굫 is gyoh. Perhaps a reference link would be good?
I was thinking this would start with syllables that are parts of words, like is. But I guess it won't hurt if you want to do all the code points. Cynewulf 10:09, 24 January 2008 (UTC)
That's interesting. What kind of IME do you have? r-y-g is the keystroke sequence on all (as far as I'm aware) keyboards here in South Korea, which have the same layout I get when I install the Korean IME on a non-Korean Windows system. "Standard South Korean" keyboard maybe? I have no idea what layout they use in North Korea, or how to find out. -- Visviva 14:08, 24 January 2008 (UTC)
Well, I'm of the opinion that there is no more of lexical interest to be said about 났 than about 굫 or any other random syllable block. (Note that the information currently in is of a highly dubious character). So I figure if we're going to have any of these, it only makes sense to have the full set. -- Visviva 14:08, 24 January 2008 (UTC)
This kind of mapping, then, I guess. Dubeolshik? w:Keyboard_layout#Hangul_.28for_Korean.29 All my IMEs are part of scim, there's a lot of different ones. I've apparently installed one that uses South Korean romaja. (I type Japanese in ro−maji too. Less finger relearning.) Cynewulf 16:08, 24 January 2008 (UTC)
Yep, dubeolsik is the only kind you'll ever see here (except maybe if you go to a vintage typewriter museum or something). The WP article is a bit slanted; it's like putting Qwerty and Dvorak on the same footing. -- Visviva 16:45, 24 January 2008 (UTC)

Phrasal verbs template

Request for a new template {{phrasal}} which will automatically place the inflection header into a standard (needs to be decided what this means) format for a phrasal verb, and automatic entry into Category:English phrasal verbs and Category:English verbs. Any chance? -- Algrif 16:53, 23 January 2008 (UTC)

It would be helpful to have such a thing. The scope of it might depend on what level of standardization we could get agreement on. For example, could/should it generate entries for all the inflected forms that are not yet created? Should there be an idiom tag? Are phrasal verb entries supposed to replace sense lines in the associated verb entries that are "really" the phrasal verb? DCDuring TALK 17:19, 23 January 2008 (UTC)
I was thinking to use the template to replace {{en-verb}} or {{infl}} (the two mostly used at the moment) My POV on what level of standardization is to have simply each word separate, as the {{infl}} generates. It would be in line with sg= for nouns. I dont like a whole string of inflected phrasal entries when the base verb will do to get those. There would have to be some way to indicate that, for example, hang up inflects to hung up not hanged up for those substancial few verbs that have a choice. -- Algrif 17:47, 23 January 2008 (UTC)
To clarify, Algrif, are you saying that only where the phrasal verb has a preference or requirement for a particular inflection different from the base verb would you want to see the inflection made explicit at the lemma entry for the phrasal verb? Are you opposed to the creation of the inflected forms or are you opposed to them cluttering up the inflection line or the lemma entry? DCDuring TALK 18:02, 23 January 2008 (UTC)
I like it. I guess that it would be used something like {{phrasal|inf=[[shake]] [[up]]|shakes up|shaking up|shook up|shaken up}}, right? --Keene 18:45, 23 January 2008 (UTC)
I have just edited go through with as an example. The first lemma entry using {{en-verb}} is a real mess and difficult to extract useful info. The second is neat and the user can click link to go, but you do not get the information that the past participle uses gone rather than been. So an argument in the {{phrasal}} template would allow the generation of something like, I dunno, (Inflections) Use going, went, gone through with -- Algrif 18:56, 23 January 2008 (UTC)
The idea being to suppress, 1., the repetion of the preposition and, 2., the labels for the inflected parts of the verb? Given the apparent weaknesses of the WT search engine, how would a user who stumbled upon a non-lemma inflected form of the phrasal verb find the entry? I am amazed by some of the entries not found by searches. DCDuring TALK 19:15, 23 January 2008 (UTC)
I just searched running around with. run around with is near the top of the list, even though the entry has another popular type of lemma style, as you can see (One of mine, actually. Bleugh :-/) -- Algrif 19:27, 23 January 2008 (UTC) Later that evening .... OK, I see what you mean. With single particles it's not that hot, is it? However, users should, and do (I've heard) use the obvious lemma form and / or search the Category. It's what it's there for, after all. -- Algrif 19:35, 23 January 2008 (UTC)
I fear that we have to focus on the needs of users who are cranky, tired, sick, inexperienced, uneducated, and/or stupid. I like to think that I am not often all of those things, but I often have to run multiple searches and go through many screens to get what I want from WT.
I have had negative experience with searching for non-headword content in new entries, but that may have been because the entry's content hadn't yet been indexed. For irregular verbs, does the search engine put the lemma entry on the first screen a user sees if the inflected form in the search box doesn't have the same stem as the lemma? Tf not, then we would need to make sure that irregular verb forms were displayed and indexed for the lemma entry or that there were non-lemma entries that linked to the lemma (not necessarily the other way around). DCDuring TALK 19:48, 23 January 2008 (UTC)
While I acknowledge that most phase and idiom entries fall short, there is a way to do this correctly, now. Using "===Alternative forms===" the most common forms of an expression are listed (with the most common form having an entry.) The Alternative forms are supposed to redirect to the main entry. This allows for alternate forms to be contested and/or discussed on the talk page of the main entry - if a particular form cannot be attested, it can be removed from the list of forms (while retaining the redirect to assist lookups.) AFAIK, no one has taken up that task (going through Category:English phrases or Category:English idioms) that I know of. It is conceivable (but unlikely) that a bot could generate redirects for all inflected forms, for all multi-word terms, in those categories. Furthermore, "one's" could be "his", "her", "it's", "your", "their" etc. My guess, is that an initiative like that would meet a lot of resistance. (But who knows?) --Connel MacKenzie 05:13, 24 January 2008 (UTC)
How does the "traditional" style used at hold hostage look to you? {{infl|en|verb|head=to [[hold]] [[hostage]]}} Do we really need holds hostage, holding hostage, held hostage, holds hostages, holding hostages and held hostages? Do we truly need to mention those forms separately? --Connel MacKenzie 04:15, 25 January 2008 (UTC)
I have entered in both styles. I now follow my own rule of using infl for all verb phrases that include nouns and those that include only adverbs. I use en-verb for what I think of as true phrasals (with prepositions only). It is largely based on the length of the inflection line. I use en-verb on an exception basis if one or more of the inflected forms is already entered, indicating the increased likelihood that the term is commonly used. I can be influenced to use en-verb if the base verb itself is irregular, especially if it is a bit off the beaten track. To me the true phrasal verbs are the verb phrases that most warrant display of full inflections. It would be worth considering whether full inflections should be limited to irregular verbs and/or those with heavily used participial forms or some other special characteristics. DCDuring TALK 04:53, 25 January 2008 (UTC)


I feel that this template is a little excessive at the moment, though something like this should be noted somehow.


One solution to reduce the impact would be to reword it, something along the lines of

The term “Beer parlour” is considered a neologism based on standardized Wiktionary criteria.

Neologisms are terms that have only recently come into existence, they are unlikely to be widespread or well known. Their use in certain situations is likely to cause misunderstanding and may be considered illiterate.

would say a bit more about why there is such a huge warning, but I would prefer something much more subtle than this. (There is {{neologism}} for use inline with definitions, but there is concern that this doesn't go far enough). Thoughts about this anyone? Conrad.Irwin 20:27, 24 January 2008 (UTC)

There are two questions we need to ask — (1) What terms merit the message?, and (2) What should the message say about these terms? — but they very much affect each other. It's a thorny issue. —RuakhTALK 00:19, 25 January 2008 (UTC)
Well, (1) certainly not all the terms currently suggested by WT:NEO (i.e. anything that doesn't appear in a major dictionary). Suggest that we amend that page to say "not in any major dictionary and attested in use over a period of less than 10 years," possibly replacing 10 with some other small number. -- Visviva 01:22, 25 January 2008 (UTC)
O.K., but if a word is only attested from 1830 to 1835, is the term "neologism" still fitting? —RuakhTALK 01:34, 25 January 2008 (UTC)
Clearly the term "neologism" is not suitable for the casual visitor to Wiktionary in the kind of situation that Ruakh refers to. I thought that the "dated" tag was supposed to cover such things. DCDuring TALK 02:30, 25 January 2008 (UTC)
OK, how about not attested in major dictionaries and first attested less than (ten) years ago? Could take an argument for year of first attestation so as to automagically disappear/transform itself once the deadline was past. -- Visviva 03:08, 25 January 2008 (UTC)
Sure. Perhaps it could pop up on some kind of review list. Would we have many thousands of these? DCDuring TALK 03:48, 25 January 2008 (UTC)
I don't mind narrowing the scope for "neologism", but not for use of the template, such as {{warn|eye}} per my talk page, or other cases that are not neo terms. But that's just my two cents. DAVilla 10:58, 25 January 2008 (UTC)
The current warning is an outgrowth of many bitter arguments. The proposals above to drastically change the scope of the warning, perhaps based on a misleading warning name I think are unwise. The wording change Conrad listed above (disclaimer: that he discussed on IRC before posting) I find satisfactory - it doesn't change the meaning of the template at all, only the verbosity. If some would like to offer a new measurement, with new uses, perhaps a separate approach should be taken for that. The English Wiktionary certainly could use a sweep of all terms that have only appeared in the last ten (or twenty) years. Tagging those terms with some new template (and/or category) with a much milder warning seems reasonable. But each entry that has Doremitzwr's tag on it was (obviously) fought over, with some sort of consensus reached. To randomly change the scope of the warning (presumably to just start removing it) would likely trigger significant retaliation from all quarters. --Connel MacKenzie 04:07, 25 January 2008 (UTC)
What about something more on the scale of the "big" WP link box? It could be high on the page without using 25% of the available space on a user's screen. It would be great if if could be made to fit in the white space to the right of the table of contents that our longer entries have. I suppose we could have three levels of warning: the current banner, the WP-size, and the sense-line. DCDuring TALK 02:30, 25 January 2008 (UTC)
Weren't we supposed to be deprecating all the floating boxes? Adding another to the mix doesn't seem very wise. --Connel MacKenzie 04:07, 25 January 2008 (UTC)
I dislike right floating boxes intensely, I have a wide screen and they are almost always floating way outside of the rest of the article. The information in this template should be in the "usage notes" section. I feel that we could have lots of warning templates, such as {{warn|trademark}} {{warn|archaic}} {{warn|offensive}} {{warn|obsolete neologism}} that could each add a short but relevant sentence explaining what the issues with the word can be. Conrad.Irwin 11:41, 25 January 2008 (UTC)
Do we have any relevant facts about user behavior in response to any aspects of our entry design? Don't we need some ? Obviously we can't wait if we don't, but it would help a bit to get a better a better idea if what we were doing was helping. DCDuring TALK 11:56, 25 January 2008 (UTC)
Not as far as I know, it is a hypothetical aesthetics question, and thus highly defendant on individual preference. Whether someone actually uses Wiktionary is another matter entirely :). (When I browse wikt pages I am so used to having to ignore clean-up, rfd etc. templates that I generally don't read anything in a box until after I have found the definition) Conrad.Irwin 21:38, 27 January 2008 (UTC)

"article" vs. "entry"

Note: Previous discussion on this topic may be found at either Wiktionary:Grease pit#remnant of Wikipedia or Wiktionary:Grease pit archive/2007/December#remnant of Wikipedia.

This was discussed at the Grease pit, but as it's not really a technical issue, I think it deserves note here before being implemented: currently, the tab for main-namespace pages says "article"; however, most of us actually refer to those pages as "entries" (in keeping with the general English preference for "dictionary entry" over "dictionary article", though the latter usage does exist), leaving "articles" for their Wikipedia counterparts. So, how do people feel about changing the tab to say "entry"? —RuakhTALK 02:20, 25 January 2008 (UTC)

Positive (it makes for good distinction). Harris Morgan 02:27, 25 January 2008 (UTC).
It seems like a natural way of gently reminding visiting editors that the rules here are not the same as on WP. Drawbacks are not apparent to me unless it would be a lot of work. DCDuring TALK 02:33, 25 January 2008 (UTC)
Nope, no work at all; it's controlled by MediaWiki:Nstab-main. —RuakhTALK 02:41, 25 January 2008 (UTC)
  • I support changing the tab text from "article" to "entry". Rod (A. Smith) 04:16, 25 January 2008 (UTC)
  • Me too. -- Visviva 06:06, 25 January 2008 (UTC)
  • Support. DAVilla 10:42, 25 January 2008 (UTC)
  • Support. Me three. -- Algrif 10:47, 25 January 2008 (UTC)
    Watch it there, buddy... you four! DAVilla 10:49, 25 January 2008 (UTC)
  • Support, I had this on my todo list, but it must have fallen off. Conrad.Irwin 11:29, 25 January 2008 (UTC)
  • Support, of course. DCDuring TALK 11:48, 25 January 2008 (UTC)

Done. DAVilla 11:51, 25 January 2008 (UTC)


Would anyone object to my adding support for a nosig=1 option to {{support}}, {{oppose}}, and {{abstain}}? The idea would be that people could type {{subst:support|nosig=1}} comment ~~~~ to put a comment before their signature. Polling is evil, but polling+comments is less so. :-)   —RuakhTALK 02:27, 25 January 2008 (UTC)

  1.   Support Connel MacKenzie 04:10, 25 January 2008 (UTC)

Sorry, this doesn't work very nicely. If a template is substituted, then #if's are not evaluated, they are dumped. I would prefer to add a parameter as a comment, but then you have to always provide that parameter, or you have to live with unevaluated parameters in the resultant text, if you choose not to use a separate template. DAVilla 10:47, 25 January 2008 (UTC)

A better solution would be to allow people to put a comment in the template, {{subst:support|I have been looking forward to this for ages.}} and have it still sign automatically. It would leave {{{1|}}} In the subst:ed text though, but I feel that is better than leaving {{#if:{{{nosig|}}}||Conrad.Irwin 11:28, 25 January 2008 (UTC)}}
Yup. The only problem would be if the comment had an equals sign... or unmatched double closing brace, for that matter. When, oh when will we have wiki syntax 2.0? DAVilla 11:54, 25 January 2008 (UTC)
I considered that, but firstly had the same thought as DAVilla, and secondly was concerned that if someone messed up somehow (e.g. by having an equals sign as he mentions) then as soon as they saved, the entire comment would subst:'d out of existence. And multiparagraph comments are easier with #: than with <p>. —RuakhTALK 13:10, 25 January 2008 (UTC)
What do you mean? {{User:Ruakh/Template}} works perfectly, leaving no weird stuff in the wikitext (see the talk-page, which was produced using it). —RuakhTALK 13:10, 25 January 2008 (UTC)
Ah, clever. I never did like nodot etc. though; I don't think the design is ideal. But I have don't have any better ideas, apart from using just {{subst:support|nosig}} or even that or {{subst:support|}}. If it were a separate template, I'm not even sure what a good name would be. Anyways, I withdraw my objection. DAVilla 16:43, 25 January 2008 (UTC)
I don't like them either, actually (even though I think I'm the one who introduced them). Maybe {{subst:support|sig=no}} would be better? —RuakhTALK 19:08, 25 January 2008 (UTC)
Well there's a simple fix to avoid nodot that I'm not going to bring up again.
It's possible to make the voting templates work so that any value (or even the empty string) for parameters 1, sig, or nosig cancelled out the signature. DAVilla 11:01, 26 January 2008 (UTC)
I don't know what simple fix you're referring to, so please do bring it up again (or e-mail me a link, or something).
One option is to support a true sig= parameter, such that {{subst:support}} is equivalent to {{subst:support|sig=~~~~}}, and one can suppress the signature entirely using {{subst:support|sig=}}. While we're at it, we can add a 1=/comment= parameter, whose value is inserted between the bold word "Support" and the signature, and even a text= parameter, such that {{subst:support}} is equivalent to {{subst:support|text=Support}}. Maximum flexibility: the only mandatory components would be the # and the image. ;-)
RuakhTALK 16:14, 26 January 2008 (UTC)
The simple fix for form-of templates is to remove the period. Unfortunately that would absolutely destroy all of the entries that use them, entries that were considered low-priority in the first place but which would all have to be updated in batch immediately to add the dot back in, lest those thousands upon thousands of sentence fragments be unstopped.
As long as we have something like a comment=. paramter that is by default listed like that in new votes, I don't see any trouble in simply using that, with no need for a parameter for signature. Otherwise I would be inclined to {{subst:oppose|nocaps|nodot|comment=the implicated complexity…|sig=DAVilla 16:27, 27 January 2008 (UTC)}}

I have stumbled across a non-obvious solution to both this problem and the problem of attributing comments of multiple paragraphs. The latter problem I have seen with those who do not vote often, anyone from Richardb to Paul G.

The use of {{subst:support}} would not change, but I would replace the template with:

# [[Image:Symbol support vote.svg|20px]] '''Support''' ~~~~

(possibly allowing a more comlicated sig= signature, if anyone really feels it necessary). This places all comments on an indentation, which is by the way ignored if no comment is left. DAVilla 21:10, 16 February 2008 (UTC)

Incidentally, should we be using {{neutral}} instead of {{abstain}}? DAVilla 10:48, 25 January 2008 (UTC)

Abstain is better than neutral, it is the same part of speech as support and oppose for one thing :). Conrad.Irwin 11:28, 25 January 2008 (UTC)
(Sorry to split your comment.) The interpretation is different though. "Abstain" means you do not cast a vote, "neutral" means you do. It's come up a couple times in this vote and this one, when my misinterpretation and, in fact, misdefinition had led to conflict between myself and Encyclopetey. DAVilla 11:48, 25 January 2008 (UTC)
(no worries) Perhaps then {{comment}} would be better (I have no idea what purpose it is serving atm), if I want to abstain I won't vote unless I have something to say about it. Conrad.Irwin 19:17, 25 January 2008 (UTC)
It's currently used to give tooltip text, and format in a way that makes it apparent there's tooltip text: {{comment|inline text|tooltip text}} yields inline text. Only two entries use it right now. —RuakhTALK 21:34, 25 January 2008 (UTC)
Abstain is basically treated like Comment, expect that Comment is a little weaker since you can vote for/against and still comment. Abstain is kind of like a comment that the voters chooses not to vote and wants to make that known. I'm not sure if we'd want to use Comment. It doesn't sound like a bad idea, but it wouldn't substitute for Abstain.
What I'm suggesting is that Neutral or some verb variant be used to express a vote that is neither for or against, but still counts as a vote. This makes a difference for us since we use a supermajority to determine concensus, so a "neutral" vote is actually a little bit negative. With 2/3 desired, on a scale where oppose is 0 and support is 1, neutral comes in midway at only 1/2.
Essentially, since we want any decisions we make to clearly carry support, a declaration that one does not support a measure isn't a positive one. Consider, for instance, a vote with only two people in favor and everyone else neither in favor nor opposed. If everyone else abstained, then the two votes would carry the day despite the fact that a single voter could make up his/her mind and stagnate the result. However, if everyone else voted neutral, then it would be more apparent that the measure did not carry the support that it needs, and the vote would be snubbed like a candle. Four neutral votes would be enough to counterbalance the two votes of support.
The question could also arise in issues of quorum, though we don't have any guidelines for that. DAVilla 10:49, 26 January 2008 (UTC)
I disagree in two respects: Firstly, if a few people want to do something, and absolutely no one objects, then it's really obnoxious to say, "Well, no one minds what you want to do, but even so, we're not going to let you do it, because you're just two people and we don't care what you want." Secondly, to me "neutral" has exactly the same connotations as "abstain": indeed, I'm pretty sure I've even abstained before with a comment to the effect of "I'm neutral on this, but wanted to point out […]." —RuakhTALK 16:24, 26 January 2008 (UTC)
I see nothing wrong with keeping our current use of {{abstain}}. I think it is obvious to all viewers that someone who uses that template is "voting neutral" rather than purely abstaining (which could be done by not edting the page). And, yes, it's the same part of speech as support and oppose.—msh210 18:41, 29 January 2008 (UTC)

Wiktionary:About Swahili

Since Swahili has unusual features such as noun classes, which are on many "inflection" lines or translation table entries, it would be nice to have an "about" page to explain such issues and their formatting. Could one or more of our resident Swahili experts please create Wiktionary:About Swahilihippietrail 05:18, 25 January 2008 (UTC)

Do we have more than one Swahili expert? -- Visviva 06:05, 25 January 2008 (UTC)
Yes, that would be a good idea. High time I did a lot more work on Swahili. The noun classes are a feature of the Bantu language family. (What they are doing in translation tables I don't know, and no-one uses the numbers outside of academic literature, where there are at least two numbering systems ;-) Robert Ullmann 06:24, 25 January 2008 (UTC)
I was working on the interface translation (see sw.wikt, or set language to sw in preferences here or anywhere), and now need to do more. The next thing is {{sw-conj}}, which I have been putting off. There are 7 simple forms (infinitive, imperative, imp. plural, neg. imp., neg. imp. plural, subjunctive, neg. sub.) but then there are 11 tenses, 3 persons for the M-WA class, 5 other classes, and each singular/plural. Total of 183 forms. Oh, and then NGE, NGELI, NGALI tenses, and relative forms. (I think separate tables would be good ...). Then derivative forms. (la eat, liwa be eaten; nywa drink, nywewa be drunk) those are separate entries. Oh, and forgot object markers. (;-) Robert Ullmann 12:27, 26 January 2008 (UTC)

Interrogative pronoun templates

What is the difference between the {{infl|hu|pronoun|interrogative}} template used in the inflection line and {{interrogative|lang=hu}} template used just before the English definition of the word? Do we need both in one entry? Neither puts the word in Category:Hungarian interrogative pronouns. The category has to be added separately. --Panda10 11:49, 25 January 2008 (UTC)

The infl template adds an inflection line under the POS header. The interrogative can be used even if a language-specific header is used. We have not been consistent in the way we format pronoun entries. --EncycloPetey 05:42, 26 January 2008 (UTC)
Thanks. --Panda10 12:29, 26 January 2008 (UTC)

Lemmatizing Sanskrit

So basically the problem is this: for some substantives, the lemma form which usually occurs in the dictionary is never attested in actually written documents, because of this little thing called w:sandhi. For example, in nominative singular there are actually two forms the nouns ending in transcribed '-s' (whose declension you can see e.g. at w:Vedic Sanskrit grammar). Adding entries that never actually occur smells like trouble to me (nominative singulars would then be derived forms). Sanskrit wiktionary is basically dead, and other ones seem to mix both the traditional dictionary forms and the actual phonetic ones (usually choosing nominative singular version that ends with visarga "ः"). However, departure from 2500 years of written Sanskrit grammar tradition is something at least worth discussing. I personally have no stance on this matter, so adding both "patis" or "patiḥ" will satisfy me. --Ivan Štambuk 03:02, 27 January 2008 (UTC)

Let me start with a broader perspective. In my opinion, we should be looking at attestation of overall entries from several different angles. First, is the term citable as defined for its part of speech? Then, are the spelling and/or pronunciation attested? Finally, are the inflected forms or related issues like countability attested? (This still doesn't get into attestation of context like regional use, or acceptance of spelling or pronunciation, etc.)
Unfortunately, these are impossible issues to separate. Primarily, it is difficult to link spelling with pronunciation when the term itself is in need of attestation, so really at least one or the other has to be attested at the same time. Here we seem to require spelling in every case since that is the way we catalogue our entries, but despite that there is already some openness, extending naturally from transcription, for terms that can only be cited orally, as would apply to languages that do not have their own writing system, or to lanuages such as English where authors are praised for documenting the way people actually speak. Traditional languages, in contrast, raise eyebrows. But that's a separate problem.
Getting back to your question, consider necropost as a verb. From quotations like "apologies for the necroposting" and "necroposting is tolerated", it's not clear that this could be used as a verb in other forms, so these could not count for the part of speech, although conveying the correct meaning they could attest the inflection of a root form, if that existed, or at least the functional use of necroposting as a noun. On the other hand, "necroposting a thread long considered dead", although not used in the base form, is clearly used as a verb, the gerund thereof, and should count as citation for both the part of speech and for the inflected form. What it doesn't attest is the base form, which is the problem you raised. My take, then, is what a waste it would be to have an attested term but not have anywhere to put it.
Granted, necropost will probably fail with the stricter criteria, but let's draw an analogy to your situation with a hypothetical. What if there were two quotations for necroposting, two quotations for necroposts, and two quotations for necroposted, all clearly used as verbs, but none for necropost? Certainly we wouldn't say that none of these were attested because they couldn't be so individually? But then what would these be attestations of? The answer clearly is necropost, and I could come to no other conclusion than to put the term there, even if neither the base form nor any of its inflections could be attested, or indeed even if the base form were never seen in use. And seriously, would we be any more satisfied with that decision to see each of to necropost, will necropost and should necropost, but not the simple present in plural or first/second person? DAVilla 16:02, 27 January 2008 (UTC)
I agree with that, I think...for me the answer is basically: put them under the standard lemma form whether it's attested or not. That's what I've been doing for Old English, although when that happens I would add a Usage Note to explain. Widsith 16:32, 27 January 2008 (UTC)
Good points Davilla..I've already raised an issue of unattested forms of extinct languages a while ago here in WT:BP, which unfortunately didn't attract much replies, about inflected forms from which base forms:
1) could not be reconstructed with 100% certainty (e.g. declensional patterns were in the process of falling apart, like in OCS кръі (krŭi) which is even today mostly wrongly lemmatized as accusative singular кръвь (krŭvĭ))
2) it is impossible to deduce which inflectional pattern is in question, and deduce proper lemma for it, so instead guessing the lemma just the attested form is listed.
However, this problem is not like that. An example; see , and the coresponding entry on br.wikt: br:अग्नि, and especially it's declension table. In nominative singular it's listed like , not like it's {PAGENAME} (sandhi rules also allow other versions as well), and what's more important, none of the forms listed in declension table is identical to {PAGENAME} ! So basically it's not the problem to deduce lemma from the actually attested forms, but in adding lemma form that is never attested as used (not mentioned!) in written texts, but is usually the form listed in dictionary. See it e.g. in Monnier-Williams dictionary entry online, which has an entry as . As I mentioned, some wikt's use the form with "visarga" sign, like tr:अग्निः :)
I'm primarily concerned with the way this would conflict with attestion policy of actual usage of terms, and how weird would it look like to see nominative singular(s) listed in Category:Sanskrit noun forms? ^_^.
For the verbs, It's also usual to lemmatize them as third-person active present singulars (besides "root" form which also never really occur), so there's no problem with them. --Ivan Štambuk 17:28, 27 January 2008 (UTC)
I've been facing a similar decision with some Ancient Greek contracting verbs, such as φιλέω. Ancient Greek verbs with vowels immediately before the final omega generally contracted to an omega with circumflex. The situation is slightly different from yours in that the the uncontracted form is often attested in earlier authors (but sometimes not). Nonetheless, I think it best to go with tradition on this one, which is definitely to list under the uncontracted form. Atelaes 18:57, 27 January 2008 (UTC)
It appears to me that that the "tradition" should be followed here as well. However, issue has (had?) to be settled before inflection templates (which would presumaby rely heavily on {PAGENAME}) could be made. The drawback is that the Sanskrit will be the only language with nominative singulars as "derived forms", and that some of those inflected forms will be interwikied to lemmata of some other wiktionaries. --Ivan Štambuk 21:44, 27 January 2008 (UTC)

CFI: systematic names of chemical compounds, encyclopedic rather than dictionary?

Are the names of every chemical compound, when they are named following the systematic rules, in the CFI as needing entries? I'm not meaning old terms like blue vitriol which in an obsolete and obscure term for Copper(II) Sulfate. I'm talking like arsenic triiodide, which if you know a little chemistry is obviously AsI3. It seems that might be more fitting for the Wikipedia than the wiktionary and could be covered by a reference to the systematic nomenclature for chemical compounds. (An exception for common terms like sodium chloride, which is used as a fancy way of saying table salt.)

The reason I ask is because the entry arsenic has a long list of red links to names of chemical compunds and I'm wondering if I should spend the time to fill them out when about all that can be said is the expansion of the systematic name. RJFJR 15:12, 27 January 2008 (UTC)

If you want to fill them out, you have my support in keeping them. What do others think? DAVilla 15:16, 27 January 2008 (UTC)
They are certainly fitting for inclusion here (I have done quite a few already) - but I should say that they are of very low priority. People are much more likely to look on -pedia first for them. SemperBlotto 15:25, 27 January 2008 (UTC)
On the other hand, what better way to define a common chemical name, such as the ones DCDuring mentions, than to give a more standard, SoP name? DAVilla 19:32, 27 January 2008 (UTC)
I wouldn't think that most of those red-links are high priority, whether or not they ultimately ought to be included. On the other hand there are plenty of folks who would like to know a little alternative names for things to help them search, but don't know the elements of chemistry (like I - iodine). It would be nice if we had some entries about scientific and technical nomenclature that referenced WT entries, WT appendices, WP articles, and external sources at graded levels of complexity. The kinds of actual substantive entries that would be high-priority would be ones that took common names (and proprietary names???) and provided the generic and chemical name. Nutrient chemicals, pharmaceuticals, significant pollutants, household chemicals, major industrial chemicals, any material with an ancient name all seem worthwhile. I wonder whether the same thinking applies to all of the technical terminology that we might consider including. DCDuring TALK 15:34, 27 January 2008 (UTC)
strikes me as SoP, but if someone wanted to add them, I wouldn't object. If nothing else, it might be useful to have synonyms, projectlinks, and so on. (But we certainly need terms like : they're SoP if you recognize the prefix, but there's no reason to assume that people do.) —RuakhTALK 16:55, 27 January 2008 (UTC)
I'd agree that they are low priority but worth keeping. Even though the name is assembled according to regular rules in most cases, the resulting words are not SoP in most cases. Sodium chloride is a salt, not a mixture of sodium metal and chlorine ions, and has different properties from the individual components. Lexically speaking, the synonyms and translations will also not be predictable from the individual components of the name, so again, I don't see these as merely sum or parts. --EncycloPetey 18:57, 27 January 2008 (UTC)
This seems like a very bad idea, unless there is some obvious red line between cases like arsenic triiodide and the high-order-of-infinity possible number of chemical names that can be constructed in IUPAC nomenclature. I don't much care for anything that could potentially justify someone running an entry-adding bot at full speed for all eternity. Of course our entries for arsenic, sodium, iodide, triiodide, etc. should mention that in chemical compounds these words generally refer specifically to the anion/cation. -- Visviva 09:20, 28 January 2008 (UTC)
If we had limitless resources, why couldn't WT be a chemical dictionary too? Could we recruit a few chemists to populate and police it? Would anyone use it?
Do we really need to prohibit anything here? Would it be more useful to characterize desirable features of entries in this field and the entries that deserve priority attention? DCDuring TALK 12:36, 28 January 2008 (UTC)
The simplest approach is to allow human-generated entries, but not encourage them. I could write a bot to pair up every cation with every anion - but it would be a gigantic waste of resources, and would generate compounds that nobody has bothered to create (e.g. molybdenum ricinoleate). Probably best to concentrate on those that get requested, have existing red links, or have entries already in -pedia. SemperBlotto 12:46, 28 January 2008 (UTC)
Could you create a bot to create only those pairs that are, in fact, red links or 'pedia entries? Also, as to the original question, I think we should have them all. bd2412 T 15:50, 28 January 2008 (UTC)
Your hypothetical bot would likely pair cations and anions that chemically react, which is why we should trust humans to make the entries. In theory, each of these needs to be citable. DAVilla 22:46, 28 January 2008 (UTC)
I would be much happier with someone (human) working toward the object of having either a WP article or standard chemical nomenclature at the end of a chain (preferrably short) of definitions starting with some common words like "air", "bleach", "vitamin A", "brass", "sand", "peanut butter", and even "Tylenol" and "Prozac". Some of our definitions are solely descriptive or functional and could use supplementation in the direction of composition or manufacture. DCDuring TALK 00:09, 29 January 2008 (UTC)
I suggest a CFI modification for multi-word chemical names (like arsenic triiodide) that allows them if there are three cites, etc., that discuss the compound not in the context of a larger discussion of similar compounds. That is, if there's some chemistry-journal article on salts, and it refers to arsenic triiodide inter alia, that's not good enough; we need an article on arsenic triiodide, or on it and very few other compounds. (Note that I'm guessing here that arsenic triiodide is a salt. I have no idea.) This would nearly automatically allow any compound name that has entered the lay lexicon (e.g., acesulfame potassium and sodium chloride), incidentally, but would also allow others.—msh210 07:50, 29 January 2008 (UTC)
Most systematic names are SoP. The definition of arsenic trichloride is "a chemical compound containing three atoms of chlorine for each atom of arsenic", not the short description that is currently on Wiktionary. Translations of such names are regular. However, terms like butter of arsenic are obviously not SoP, and Wiktionary can add useful dictionary information which is missing or difficult to find on WP. Physchim62 10:50, 3 February 2008 (UTC)

Homophones vote

I have started a vote to establish a new L4 Header ====Homophones==== for the Pronunciation section. The vote is simply on whether or not to have the new header and section, and there is archived discussion on this subject at: Wiktionary:Beer_parlour_archive/2007/April#Homophones_as_a_L4_header.

I have optimistically started a discussion at Wiktionary_talk:Pronunciation#Homophones_section on possible format, should this proposal be approved. Once the format is agreed upon, it will need a vote to put it into WT:ELE. --EncycloPetey 21:00, 27 January 2008 (UTC)

I always thought that "homophones" already was included in CFI. It's so common around here anyway. --Keene 21:20, 27 January 2008 (UTC)
I think you mean ELE. It's not advocated in ELE as a header in its own right, but you are correct that it is commonly used despite not being advocated in ELE. --EncycloPetey 22:31, 27 January 2008 (UTC)
Sounds good. I too had the impression it was already formal. --Connel MacKenzie 17:49, 28 January 2008 (UTC)

New etymology discussion

I have posted a new discussion at Wiktionary talk:Etymology#New template concerning a new template for etymon languages which is meant to replace {{L.}}, {{AGr.}}, etc. Would everyone who is interested please take a look and leave any relevant comments. Many thanks. Atelaes 04:37, 28 January 2008 (UTC)

Irish & Scottish Gaelic - page titles and the article

There seems to be a lot of confusion in many Wiktionaries I’ve looked at lately about naming pages in these languages. To me, it’s reasonable to follow standard dictionary practice and not include the article in the page title, but a few entries that do still pop up here and many do elsewhere.

This is especially a problem with country names since many of them require the article when actually used and many of them are feminine which means the initial consonant may be lenited. A couple of examples...

  • France: Frainc / An Fhrainc 'the France'
  • Germany: Gearmáin / An Ghearmáin 'the Germany'
  • Denmark: Danmhairg / An Danmhairg 'the Denmark'
  • Austria: Ostair / An Ostair 'the Austria'
  • Wales: Cuimrigh / A' Chuimrigh 'the Wales'

Sometimes (though I think not here) you will also see pages called 'Fhrainc' which is just about as wrong as it can get, about equivalent to writing something like 'a single children' in English.

Another issue with country names is that while many require the article, some never use the article, so it would be good to give some indication on the page.

So I propose (whether I have time to do it is another matter) redirecting any current pages with the article to pages without, but including the article (not bold) under the noun section where appropriate.

Can anyone see any problems with that? ☸ Moilleadóir 05:42, 28 January 2008 (UTC)

How about having both the normal and articled form in the entry (seem appropriate), and have a "lenited form of" template instead of redirects? Circeus 16:36, 28 January 2008 (UTC)
In my opinion, the content should, as noted, be located at the Frainc pagenames; I agree with Circeus that the article forms should not be redirects. We have to decide, however, whether to use a template or to simply delete the An Fhrainc pages — and I can see a strong case for deleting them. In any case, the entries should note somewhere (either in the section under the noun header, or in usage notes) the forms of the words, with articles and without them, including any lenition. — Beobach972 16:44, 28 January 2008 (UTC)
Note that in English — regarding the obligatory inclusion of the articles — one does not normally speak of White House, one speaks of the White House, but the dictionary form is, as far as I know, White House, not the White House. — Beobach972 16:47, 28 January 2008 (UTC)
While I know nothing about Gaelic, from your description, Moilleadóir, it sounds like Fhrainc is a word, albeit one found only in certain contexts (namely, after an article). I see no reason we shouldn't have an entry for it. (People will look it up.)—msh210 17:06, 28 January 2008 (UTC)
Thank you, msh210 — you comment makes me realise that I failed to be clear when I suggested that some of the pages be deleted. I expect that many contributors might object to pages like An Fhrainc (ie, pages with articles, for the same reason they would disallow the White House), but I do think Fhrainc should exist. — Beobach972 17:28, 28 January 2008 (UTC)
It’s obviously a little hard to explain, but seeing a lenited word by itself looks deeply, deeply wrong. Lenition only ever occurs in context (there are a few). There really is no equivalent in English that I can think of. Hmmm...maybe a better analogy would be to say that we should have an entry -house because there can be a hyphen in tea-house. Just because it happens in context doesn’t mean it deserves its own entry.
If it was a choice between having pages with the lenited form and pages with the article (in the title), I’d go for the article!
If I ignore how wrong it looks to me, I can see a certain amount of logic in creating 'lenited' pages, but it would have to be bot work. Taken to it’s logical extreme for Irish this would mean this list of variations (lenition, eclipsis & other prefixation caused by the article): A (tA, nA, hA), a (t-a, n-a, ha), B (Bh mB), C (Ch gC), D (Dh, nD), E (tE, nE, hE), e (t-e, n-e, he), F (Fh bhF), G (Gh nG), I (tI, nI, hI), i (t-i, n-i, hi), M (Mh), P (Ph, bP), S (tS, Sh) & T (Th, dT). And for Welsh, with it’s three mutations you’d probably have much more.
While this might make sense as an abstract standard, it doesn’t seem reasonable to expect the average contributor to maintain it.
So long as someone doing a search for Fhrainc or an Fhrainc (or indeed Fraince or na Fraince) finds the entry Frainc, where is the problem? ☸ Moilleadóir 02:18, 29 January 2008 (UTC)
"Form of" pages are actually one of our big advantages over old-fashioned print dictionaries; you can look up, say, and see exactly what kind of form it is of exactly what Spanish verb. These entries are generally added by computer programs called "bots" that editors write, so don't feel like you need to add them yourself! (More generally, there's nothing that you should feel obligated to work on here; if something doesn't interest you, don't worry about it, as there will always be someone who is interested in it. Or in writing a bot to handle it.) —RuakhTALK 02:40, 29 January 2008 (UTC)
Even with bots, I’m not sure that mutations in Celtic languages are worth considering as 'forms'. Actual declensions and conjugations are a different matter, though you could argue that they are closely linked.
While I admire your optimism, I don’t think there always is someone interested when it comes to minority languages.
In any case, what I am interested in is sorting out the simplest standard for country names regardless of what baroque bot work might be done later. ☸ Moilleadóir 02:58, 29 January 2008 (UTC)
The problem with redirects is that we have to assume that all character strings can potentially be a word in more than one language. For consistency's sake, we handle this with things like {{form of}}. If we made some lenited forms redirects where they don't correspond to a word in another language and others could not be redirects because the same string characters is a word in a different language, the reading and editorial experience would both be fragmented. Our solution on English Wiktionary is to entirely avoid the use of redirects in the main namespace. Mike Dillon 02:48, 29 January 2008 (UTC)
That’s a good point. Still not sure it justifies creating many, many more entries just because they might exist in some language. ☸ Moilleadóir 02:58, 29 January 2008 (UTC)
While I don't think that anyone should be expected to create these entries, I don't see any harm in having them if they're properly marked with something like the templates you created. As msh210 says above, people will search for them and we'd like them to be able to find the actual word. Now admittedly, someone that knows enough Gaelic to want to look a word up probably knows the lenition patterns, but not knowing a language doesn't stop people from trying to puzzle out the meaning of foreign phrases. All this being said, I guess there is a fine line when it comes to helping people who don't know a language; if you take something like Thai where there aren't usually spaces between words, I'm not sure how much you can help someone who doesn't know where to separate one word from the next. Mike Dillon 05:12, 30 January 2008 (UTC)
'Form of' templates {{lenited}} and {{eclipsed}} created. ☸ Moilleadóir 04:16, 29 January 2008 (UTC)
I'm cool with just moving them. The redirects you can delete if you feel like it. They're probably not going to get in the way anytime soon though. DAVilla 07:58, 29 January 2008 (UTC)


Could we have this hierachy, as a subset of Category:Derivations? __meco 10:56, 28 January 2008 (UTC)

I don't see why not. What do others thinks? — Beobach972 22:32, 28 January 2008 (UTC)
I think the idea has merit, however there are a number of complications. First, does one borrowing make all the previous etymons count as loan words? For example, at synonym. The first etymon, synonymum, is a borrowing from Latin. The next etymon, συνώνυμον is a Latin borrowing from Ancient Greek. Would synonym thus go under Category:Latin borrowings and Category:Ancient Greek borrowings? That would be my impulse anyway. A parameter could be put into {{etyl}} so that the editor can specify whether the word should go into descendants or borrowings. We should also decide on wording. Perhaps we could use "descended from" for descendants, "borrowed from" for borrowings, and "from" for unspecified. Finally, how will this affect the L4 "Descendants?" Would we create a separate L4 for borrowings, or would we (my preference) simply specify as followings:


  • English: xxx (borrowing)
  • Spanish: xxx (descendant)
  • Italian: xxx

In this case the Italian is unspecified. It could be a borrowing or a descendant. Finally, we may want to consider transplanting this entire conversation to Wiktionary:Etymology. That's all the issues I can think of for now. Atelaes 22:55, 28 January 2008 (UTC)

I'd want to be sure we had a working definition for "loanword", "borrowing" etc. That way we're all sure to be discussing the same thing. In previous discussions on this issue, I've gotten the impression that we don't all mean the same thing when using these words.
It would also be nice if (whatever we decide), we then create a template for the Descendants section that will link to the specified language, allow for an alt= display form, etc. the way that {{t}} does for Translations and {{term}} does in the Etymology. This template can display "borrowing", "descendant", or whatever, and could be monitored by bot for missing information such as in the Italian example above. --EncycloPetey 23:38, 28 January 2008 (UTC)
I don't think it's really necessary to make the distinction between inhereted lexemes (i.e. those that come from "olden" version of a language) and those that are taken in unchanged/adopted form from another language. The distinction between those two is pretty much implicit. Any word originating in Middle English is effectively inherited (i.e. a direct descendant), all the other ones are loanwords (unless they're made up or something). Words of Middle English period not present in Old English are loaned with respect to Old English, and so forth. For constructed languages such as Esperanto, all of lexemes in ther Etymology sections are loanwords. Common Slavic word * is of Germanic origin, loaned at least 1500 years ago, but subsequently exhibited lots of phonological changes that gave modern Slavic versions. It's pretty much implicit at which stages the lexeme was borrowed, and at which ones inherited. So far I've seen (and practiced myself) in L4 Descendants header usage in both senses, and I think it would be bad to see them cluttered with self-explanatory glosses as "borrowing" or "descendant". It would be just needless complication of present system that functions just fine. --Ivan Štambuk 12:33, 29 January 2008 (UTC)
What about Spanish words from Latin? Most of them are inherited, but modern borrowings are different. The "borrowing" tag seems important in such cases, doesn't it? Rod (A. Smith) 17:34, 29 January 2008 (UTC)
Beside {{L.}} "Latin", there are also {{OL.}} "Old Latin", {{VL.}} "Vulgar Latin", {{ML.}}, "Medieval Latin", {{NL.}} "New Latin", {{LL.}} "Late Latin", for different periods of Latin. All of them have the same ISO code but mean different things, and therefore categorize in different categories. Neo-Latin coinages (like the names of chemical elements) are really loanwords in all languages, even Romance. For cases such as these, borrowing/inheritance dichotomy can be maintained by not choosing {{L.}}/{{VL.}}. --Ivan Štambuk 18:09, 29 January 2008 (UTC)
For modern borrowings, it might be difficult to trace the original language: television, for example (see w:Constantin Perskyi). Physchim62 14:28, 1 February 2008 (UTC)

Loan translations

I also have come upon the phenomenon of loan translations or calques which I believe is a decendant phenomenon of [Category: language derivations], on par with loanwords. __meco 07:56, 7 February 2008 (UTC)

List of all Hungarian words in Wiktionary

Would you by any chance have a dump of all Hungarian words that are currently in Wiktionary? I'd like to go through each to make sure they are at least in one category. This list would also help me to update the Hungarian Index. Thanks. --Panda10 23:26, 28 January 2008 (UTC)

I have created a list of them from the most recent XML dump (20080116) and put it at User:Panda10/Hungarian. (This is my first time in playing with the XML dumps properly, so if something seems wrong let me know). Conrad.Irwin 00:01, 29 January 2008 (UTC)
Thank you! :) --Panda10 00:35, 29 January 2008 (UTC)

Conrad, I've noticed that the list contains duplicate entries, e.g. alvajáró alvajáró. Also, the number of words (2886) seems to be higher than on the Statistics page (2647). --Panda10 02:21, 29 January 2008 (UTC)

Ok, there were duplicates - not sure why, so I have cut them and also added an automatic table of contents to every heading (more for my programming enjoyment than any other reason). The number now comes to 2678 - which is a lot less higher. Conrad.Irwin 13:20, 29 January 2008 (UTC)
Oh, the table of contents is extremely helpful. Thanks for the correction. --Panda10 16:18, 29 January 2008 (UTC)

Correcting incorrect Reconstructed language links

After seeing a couple links to Proto-Indo-European terms in the normal namespace (instead of in the Appendix namespace where they should be), I made a list of all links that start with "*" (~210 of them). The problem is that I am no where near capable of fixing them. We have few suitable exiting PIE entries. Any suggestions? Should we redlink them to the Appendix namespace and worry about them later? --Bequw¢τ 18:50, 29 January 2008 (UTC)

Well, {{proto}} won't redlink if the appropriate entry doesn't exist in the Appendix namespace, so yes, they should be {proto}-ized nevertheless. Some of those are already fixed, and some of those are wrong (fsck linking to *nix) --Ivan Štambuk 19:15, 29 January 2008 (UTC)
Thanks, I wasn't aware of the template. --Bequw¢τ 21:44, 30 January 2008 (UTC)
Well, {{proto}} won't redlink if the appropriate entry doesn't exist in the Appendix namespace
See User:Hippietrail/ajaxtranslinks.js for how this could be solved in a JavaScript extension. — hippietrail 00:11, 31 January 2008 (UTC)
For direct substitutions, I (or anyone that has downloaded a copy of Pywikipediabot and followed the instructions on meta) can replace all entries that point the wrong way, for each of those 210 redirects in one pass each. If you post the list of all 210 of them somewhere (probably The grease pit is best) then they shouldn't take too long to fix. --Connel MacKenzie 21:36, 2 February 2008 (UTC)
Thanks for the idea, but it's hard to do direct subs as you have to look around sometimes for what the protolanguage is (Proto-Baltic, etc.). I did them by hand anyways. --Bequw¢τ 13:43, 5 February 2008 (UTC)

Alternative spellings taking up too much prime real estate

See rock and roll. I have just inserted 4 alternate spellings (There may be more.) into this entry. The linear layout takes up too much space, IMO. Is it cool to arrange them in a row? Is this a good use of the rel template? What would go at the head of the line or in the gloss? DCDuring TALK 23:27, 29 January 2008 (UTC)

  • I say we move them to the spot above translations. bd2412 T 23:34, 29 January 2008 (UTC)
I see no reason for them to be above the definitions. However a better treatment might be to have a section somewhere for "Information", which could contain homophones, alternative spellings, audio files, phonetic transcription and any other section that only has one or two words in it - Possibly a line of each prefixed by a bold This line contains:. I believe this would be much neater than having always a full heading for each sparsely populated part of the entry. Conrad.Irwin 23:40, 29 January 2008 (UTC)
As stated on WT:ELE, we've tended to put sections ahead of the definitions that are independent of definitions and part of speech. The page for rock and roll is a good example of why we do this. If we put them after the part of speech and before the Translations, then the list would have to be given twice on the page instead of once; it would have to be included before the Noun Translations and before the Verb Translations. "Wiktionary is not paper", so we need not worry about the real estate costs. --EncycloPetey 02:15, 30 January 2008 (UTC)
I'm not following your reasoning. If rock and roll is the model, the best place for alternative spellings would be under usage notes, before synonyms, unless those are being defined as a subset of the noun, verb, etc. Maybe even a subsection of usage notes because that's what they really amount to. Or just drop them to the bottom with all the oh, and by the way stuff that maybe, on a busy day, 1 in 20 users have an interest in. It also solves your twice issue.--Halliburton Shill 05:14, 30 January 2008 (UTC)
"Attention is limited", "Patience is short", "Users miss things" are reasons why we should not let "Wiktionary is not paper" lead us to squader that initial screen space. I don't like the wasted white space on the right of the ToC or any of the above-the-inflection line material that could be more horizontal. I think I like the logical structure of what appears above-the-line, just not the vertical-space cost. My valuation of that space could be drastically reduced if there were any facts or fact-based inferences or even experienced web-designer opinion about user behavior that suggested that it had no impact. DCDuring TALK 02:36, 30 January 2008 (UTC)
I agree with Conrad and DCDuring (and with BD2412 - oh, wait, that's me). Let's redesign. In fact, the definition should be the first thing on the page. Etymologies can be split below, by definition, as we do with translations now. bd2412 T 03:14, 30 January 2008 (UTC)
Am I understanding you correctly? You want a separate etymology for each definition the same way we do for Translations? Do you realize how much clutter that would add to pages? A word with seven definitions would get seven etymologies. A word like or would be a total mess. The Etymology functions as a grouping element, showing which definitions are related and helping the user to organize the many definitions mentally. Taking that away means there are just a bunch of unrelated definitions. --EncycloPetey 03:24, 30 January 2008 (UTC)
Not that this seems like a bad idea, but that would require shuffling everything in every existing entry, while presumably maintaining the existing relationships somehow (could be done by assigning unique anchors to each sense). Can you show us a draft of how this would work? -- Visviva 03:30, 30 January 2008 (UTC)
I don't know if I have the technical capacity to pull of what I'm thinking. Something like the way footnotes work in 'pedia, though. bd2412 T 04:34, 30 January 2008 (UTC)
Our layout is awful; this is partly our fault but mostly because MediaWiki was not designed with dictionary entries in mind. However, as EP notes there is a certain degree of method to the current madness. I think it would be optimal to have a mirror that would process the highly-structured data of Wiktionary entries into a really human-friendly format. That would take advantage of the fact that our entries are currently designed more with machine readability than human readability in mind. In the meantime, collapsible boxes are a helpful tool for reducing visual noise without outraging the formattistas. -- Visviva 03:18, 30 January 2008 (UTC)
Hmmm. Suppose, as an alternative, we put everything in collapsible boxes (except the definitions themselves). Then nothing would be particularly pushing the defs down. Although I think our boxes should be improved to somewhat more boldly say, e.g., "Click here to see [etymology/alternative spellings/translations]". bd2412 T 03:25, 30 January 2008 (UTC)
We've already lost many users who used to add Translations because they don't understand the collapsible boxes. I'm talking about users who aren't fluent in English. The number of times I've seen someone add a "translation" to the TTBC section (because it's the only section not collapsed in a box) has grown over past months, with a corresponding loss to our Translations. If we collapse everyhting in boxes we won't look any different than the Russian Wiktionary (which has hundreds of thousands of pages with no content). --EncycloPetey 03:30, 30 January 2008 (UTC)
You know, I was just going to say that myself. I think that is an excellent idea. While it would call for a fairly drastic change to Wiktionary, I think it a wise one. Anyone else? Atelaes 03:27, 30 January 2008 (UTC)
Another way to redesign without redesigning would be to have a customized TOC that summarizes the primary meanings of the word in each language. Automatically-generated semantic TOCs are one of the key benefits of MediaWiki, but they don't work here because we use the headings for other purposes. In the first instance this could be applied only to those entries with serious scrolling problems (like go or run). I've been mulling this since whenever, will see if I can't patch a rough draft together. -- Visviva 03:30, 30 January 2008 (UTC)
  • This goes back several comments.
I've been looking at some pages and the overall problem is not solvable without some software that would allow user-preferences, resettable on the fly, that allowed default hiding of:
  1. all but user-selected languages in the ToC and the body,
  2. narrow-context senses, except for those the user wants
  3. user-selected above-the-line headings
and user selectable display of the hidden items on demand.
Massive use of rel tmplts and some use of the white space to the right of the ToC wouldn't hurt.
I don't know which non-WT projects would share what I see as the need for these kinds of things. Nor do I know how hard the software changes would be to develop and what resource load they would place on the servers. And it is all premised on a be-nice-to-the-users-we-know-next-nothing-about philosophy. Arrgghh! DCDuring TALK 03:41, 30 January 2008 (UTC)
While it's not exactly what you suggest, what do you think of putting links at the top of the articles that jump directly to the definition section, like I did here? This at least tries to address the problem you're getting at without a software change (which would be a long-term project). Dmcdevit·t 04:00, 30 January 2008 (UTC)
That idea has some merit, but should be developed so that the language jumped to is clear. --EncycloPetey 04:07, 30 January 2008 (UTC)
You can change the label easily enough by editing that template parameter. We could also put as many as we want, labeled however makes best sense ("Jump to Danish" "Jump to English, noun sense" etc.). Dmcdevit·t 04:18, 30 January 2008 (UTC)
I don't believe the first PoS header should ever be far enough down the page to make that useful. Further, isn't the already-bulky TOC there for such jump-to linkage? Doesn't the extra in-page link just excessively push the first header that much further down a page? And, lastly, did you also notice that the "jump to" template currently is causing (i think) an extra line-break between the header and first definition? -- Thisis0 07:51, 30 January 2008 (UTC)
I have written some javascript WT:PREFS {(Experimental) Use Conrad Irwin's Paper view & language button view alternate (not so buggy today)} that can work out which sense matches up to which translations (etc.) automatically. This is a fairly ugly proof of concept kind of thing, but it may give people some ideas as to how we can improve the situation. At the moment it gives you the choice between normal view, having lots and lots of Toggle buttons, or having only the definitions - and I have been trying to think of ways in which I can trim down the display and trim up the content. The other idea that goes hand in hand with this is some kind of editing helper, If/When I have the time I have been planning to try to write a tool that allows people to add translations without going through the whole MediaWiki edit interface, but that is a long way ahead - I would prefer to get display nice before editing nice. Ideally in the future, we will move all of this javascript into php, but for the moment it is much quicker and easier to develop on the javascript side. Conrad.Irwin 11:41, 30 January 2008 (UTC)
The carriage return was a Wikipedia leftover in {{note label}}, and has been removed. Of course if we were really going to use this solution on a wide scale we'd want a more specialized template for the job. Personally I think it would be ideal if a condensed custom TOC replaced the default TOC, which is very difficult to scan because it simply consists of the same handful of words over and over again. But a lot of work would need to be done. -- Visviva 12:14, 30 January 2008 (UTC)
Right, ad the main problem with the TOC is that if you really want to get to the definition, you have to click on "Adjective" or "Verb," which is not obvious. Dmcdevit·t 21:29, 30 January 2008 (UTC)