Wiktionary:Beer parlour/2011/February

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.

Beer parlour archives edit

2024

2023

Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

December

Independence.

How do y'all interpret this passage from WT:CFI?

Independence

This is meant to exclude multiple references that draw on each other. Where Wikipedia has an article on a given subject, and that article is mirrored by an external site the use of certain words on the mirror site would not be independent. It is quite common to find that material on one site is readily traced to another. Similarly, the same quote will often occur verbatim in separate sources. While the sources may be independent of each other, the usages in question are clearly not.

The presumption is that if a term is only used in a narrow community, there is no need to refer to a general dictionary such as this one to find its meaning.

There are a few things here that confuse or bother me (like, what's with all the non–durably-archived examples?), but the biggest is that the two paragraphs seem to be describing two very different phenomena, but the second one is worded in such a way as to imply that it's actually just explaining the rationale for the first. I doubt there's much correlation between mirroring or verbatim quotation on the one hand and sharing "narrow community" on the other.

Because the section is so vague and seemingly self-contradictory, it's hard to resolve certain RFV questions that hinge on it. For example, as I wrote at the RFV discussion about novum, the term seems to get tens of thousands of b.g.c. hits in the relevant sense — which is really quite a lot — but it's hard to find hits that don't acknowledge Darko Suvin's coinage of the term. Does that make them non-independent? Other affected terms would be ambient findability (where most uses are in books that also mention the book Ambient Findability and/or its author), osmotic communication (where most uses are in books that also mention Alistair Cockburn, who coined the term, and/or Crystal Clear, his development methodology), and possibly iDollator (where almost the only news articles using it are articles about one selfsame man — who, incidentally, wants to popularize and standardize the term).

My preferred interpretation would be something like this:

Non-independence is reflexive, symmetric, and transitive.
Verbatim quotations, near-verbatim quotations, and translations are not independent of their original source.
Multiple quotations from a single author are not independent of each other.

but I'd like input before I start applying that interpretation to RFV discussions.

—Ruakh_TALK
03:51, 1 February 2011 (UTC)[reply]

I'm going to raise another question, with an example. In 1983, Dave Gabai published an article, "Foliations and the topology of 3-manifolds", in the Journal of Differential Geometry, in which he first described a process called disk decomposition: a means of breaking up certain topological spaces into smaller ones ("disk-decomposing" them). This caught on and became an important tool in the low-dimensional topologist's toolkit. For several years thereafter, probably every reference to disk decomposition added a caveat as in "disk decomposition in the sense of Gabai" or "disk decomposition as in [4]" (where "[4]" is an item in the list of references at the back, viz Gabai's paper). Does that mean they're dependent? Later papers didn't use such caveats, as disk decomposition was by then well known (in the right circles), so only said "disk decomposition [4]", where, again "[4]" is a reference to Gabai. (I'd not be surprised to find that papers still did that.) Are these dependent? They're still "readily traced to" Gabai (to quote the CFI as quoted above). According to Ruakh's "preferred interpretation" above, seemingly all of the above are independent from Gabai (except any written by him, naturally).—msh210℠ (talk) 08:46, 1 February 2011 (UTC)[reply]

As a quick and first-impression response, the section's current wording seems worth scratching and replacing with something like your bullets. --Dan Polansky 09:45, 1 February 2011 (UTC)[reply]

Addendum: Curselax failed RFV because of this criterion, because all quotations were from postings to a single Usenet group. (See Talk:Curselax.) I participated in that decision, and still agree with it, but it's actually not covered by what I gave above as my preferred interpretation. Maybe a fourth criterion:

Multiple quotations from the same book, periodical, or Usenet group are not independent of each other.

? —Ruakh_TALK 18:30, 1 February 2011 (UTC)[reply]

Sounds good. If this bullet point turns out too stringent, which is merely a hypothetical possibility right now, it can be amended later. --Dan Polansky 18:38, 1 February 2011 (UTC)[reply]

Multiple authors' posts in the same Usenet newsgroup seems akin to multiple authors' works on the same topic in different books and by different publishers. The only "dependencies" inherent in a newsgroup are (1) that the authors are (usually) writing about similar topics and (2) that the authors (usually) have read each others' posts. (This argument does not apply to posts in the same thread, where a later author is replying to an earlier one (or to someone who replied to an earlier one, or some iteration of that), so will often use a word the earlier one used just because the earlier one did so.)—msh210℠ on a public computer 20:58, 1 February 2011 (UTC)[reply]

Multiple quotes in the same book (but by different authors, as in a compilation or Festschrift) or periodical would seem to be independent in general, as authors, I think, choose their words, for the most part. Copy editors, however, choose spelling, so perhaps they should be considered dependent for purposes of proving a particular spelling. (Of course, the way RFV usually works, attesting a particular spelling is attesting the word, so this distinction might be too fine.)—msh210℠ on a public computer 20:58, 1 February 2011 (UTC)[reply]

In a newsgroup not only have the writers likely read each other's posts, but they are also writing for each other -slash- for a shared audience. It's like when one author takes over another's series, and the target audience is basically "everyone who liked the first part enough that they're willing to put up with the new author's inability or unwillingness to emulate the old". (Of course, in the latter case the original author is usually still listed as a co-author, so it's covered anyway. :-P ) As for quotations from the same book or periodical — fair enough. I was mostly thinking of spellings (such as diacritics in the The New Yorker) and perhaps construals (such as the choice of preposition after a verb, which seems potentially subject to house style), but that may be too narrow a case to try to catch with a CFI rule. But be warned that if all issues of Penthouse are independent of each other, then "I never thought it would happen to me" is in "clearly widespread use". ;-) —Ruakh_TALK 21:12, 1 February 2011 (UTC)[reply]

Your second two bullet points seem to sum up the situation well, as I understood it too. I can't quite work out what the first one means though. Ƿidsiþ 10:59, 2 February 2011 (UTC)[reply]
- The first means that if work A is dependent on (by which I mean not independent of) work B, and B on C, then B is also dependent on A, A on C, and A on A. (And, therefore, C is dependent on A, C on B, B on B, and C on C.)—msh210℠ (talk) 19:32, 2 February 2011 (UTC)[reply]

This section would benefit massively from being entirely rewritten. Mglovesfun (talk) 11:41, 2 February 2011 (UTC)[reply]

Languages and flags

Hi there,

I just noticed that, of 35 Wiktionary editions I have analysed, only 4 of them (Italian, Greek, Lithuanian, Hungarian) use flag icons to denote languages.

I'm not too sure about Wiktionary, but I know for a fact that at Wikipedia this practice was abandoned and flag dropped for very good reasons.

Relationship between languages and countries is generally not 1-to-1. In fact, it's not even one-to-many, it's many-to-many.
Flags tend to stir up all sorts of nationalist arguments that we'd rather avoid.

Please see the current discussion on Italian Wiktionary.

My feeling is that we should get rid of flags in all Wiktionary editions, especially silly ones like , or , or (my god) . Thoughts? 220.100.118.132 13:25, 1 February 2011 (UTC)[reply]

The Portuguese Wiktionary uses flags too: http://pt.wiktionary.org/wiki/ser

The English Wiktionary uses flags of countries and other geographical representations occasionally on categories of languages: Category:Cantonese language, Category:Old Frisian language, Category:Frisian languages, Category:Manx language, Category:Old Prussian language... --Daniel. 13:43, 1 February 2011 (UTC)[reply]

I agree that the flags of countries to represent languages are a bad idea. For what is worth, File:English language.svg is inaccurate. It lacks about 97% of the languages listed at the bottom of Category:English language. --Daniel. 13:43, 1 February 2011 (UTC)[reply]

Like I said, flags should only be an option, not the default. -- Prince Kassad 15:45, 1 February 2011 (UTC)[reply]

What I would like to achieve is consensus towards demoting this use of flags, from option to strongly deprecated option, or something like that.

Just to expand on the technical point, getting flags right for languages is not simply a hard problem, it's an impossible task. The reason is that the mapping is many-to-many, and so if you see (or should I say, manage to spot) e.g. a Swiss flag in one of those mongrel flags, you have to look at the rest of it to finally work out what language the flag was supposed to convey "at a glance" - proving that its intended purpose has completely been defeated, because it's much quicker and easier to just read the word representing the language. So, no matter what alternative representation you come up with, you will always get it wrong, by construction.

This is to say nothing about those distracting stroboscopic horrors that ignore a well-estabilished web design best practice of not flashing contents - or in this case, non-contents. I mean, I respect the effort people have put in creating these things, but in my opinion they have to go. 220.100.118.132 23:45, 1 February 2011 (UTC)[reply]

I am strongly opposed to telling other language communities what they should and should not be doing stylistically on their projects. I am strongly opposed to removing optional features which in no way detract from the general usage of the project, especially for potential future problems. - [The]DaveRoss 23:59, 1 February 2011 (UTC)[reply]

I'm totally with you on your first point. I'm not at all with you on your second, in that I think all optional features have a cost, if only in that it's hard to find the useful optional features if there are too many useless ones. In general, I'm inclined to take a descriptivist view (let's get rid of features that no one is using) rather than a prescriptivist one (let's get rid of features I don't want people to use), but in the particular case of flags, we're more or less endorsing a specific set of language-to-flag mappings (by making it, and only it, available through the "Gadgets" tab of Special:Preferences), so if there's significant risk of people being offended by that endorsement, I would definitely support removing it from gadgets and making people who really want it add it to their vector.css or whatnot. —Ruakh_TALK 00:11, 2 February 2011 (UTC)[reply]

I'm with Dave on this one. I do understand Ruakh's point, but to take the descriptivist approach, you'd need to do a survey of who's using what, which might not be realistic -- for example, how many WT users, who may or may not use / like / not use / dislike the flags, will never register and never see this posting, yet have strong opinions about features of WT that they do make use of?

My own ¥2 here is that I quite like the flags, as they give me a very easy-to-scan visual cue to look for. I can very quickly scroll through a long entry and tell whether there's a Navajo or Japanese entry, for instance, just from the colors -- no reading required, which is easier on the eyes and quicker to visually parse. I think they improve the site's usability. -- Cheers, Eiríkr Útlendi | Tala við mig 03:06, 2 February 2011 (UTC)[reply]

I agree strongly with the anonip that we should not have flags, and agree with Ruakh re "we're more or less endorsing a specific set of language-to-flag mappings […] , so if there's significant risk of people being offended by that endorsement, I would definitely support removing it from gadgets" (and I think there is such risk).—msh210℠ (talk) 03:44, 2 February 2011 (UTC)[reply]

Yet, the flags are only an option -- you have to turn on the gadget to get them to display at all. So far as I understand it, there is no risk of John Q. Public / Yusef Mustafa / Juan Carlos / Ernst Baumann / etc. wandering over and seeing the flags in any default configuration, which eliminates most of the people that might be offended. This leads me to wonder if we might be ultimately catering to offensensitivity? -- Eiríkr Útlendi | Tala við mig 04:03, 2 February 2011 (UTC)[reply]

My understanding is that John Public can currently see the flags in the Wiktionary editions that have them turned on. Five such editions were identified above. 205.228.108.58 05:28, 2 February 2011 (UTC)[reply]

Well, I certainly agree with TDR that we should not be deciding for (or, really, even suggesting to) other Wiktionaries that they cease using flags.—msh210℠ (talk) 06:01, 2 February 2011 (UTC)[reply]

Fair enough, I'm not well-versed in localisation policy. 205.228.108.58 07:16, 2 February 2011 (UTC)[reply]

It's not just offensensitivity. Displaying a single flag also implies stuff about dialect that we don't (usually) wish to imply. Plus there's Ruakh's too-many-gadgets point.—msh210℠ (talk) 06:01, 2 February 2011 (UTC)[reply]

I have explicitly included the hybrid flags above to fully display their ugliness, although aesthetics is subjective and it is not my main point.

I agree with Eirikr that the original intention of flags is to be "easier on the eyes and quicker to visually parse", and if it were for me, I'd just stick to the one, de-facto standard flag. However, for languages that (unlike Japanese and Navajo) don't have a trivial language-country relationship, people do start to take issue with it, and there is no right answer to this ill-posed question, which is why we end up with such... elaborate solutions. 205.228.108.58 05:14, 2 February 2011 (UTC)[reply]

While it's true on one side that mapping flags to countries is many-to-many, we are a dictionary. Dictionaries describe words and their origins, they don't try to describe various countries and cultures, that's encyclopedia material. So what we could do is use the flag of the place whose name the language is named after. We use an English flag (the red cross one) because English is named after England, a Japanese flag because Japanese is named after Japan, and so on. This means we don't need to deal with all the different places where a language is spoken, because we can just go with the etymological origin of the language's name. After all, even Americans call their language English... —CodeCa t 10:45, 2 February 2011 (UTC)[reply]

Then it becomes impossible to do consistently because Klingon, Esperanto, etc. aren't named after countries. I bet there are some where the origin is disputed too; more contentious politics. Equinox ◑ 10:53, 2 February 2011 (UTC)[reply]

For what it's worth, Esperanto was a particularly easy flag to choose, and a quick commons search turns up a singular flag for that language as well

. Certainly the lack of mapping makes it a challenge but the ability to have multiple flags or even options allowing people to choose which flag they want displayed for each language would alleviate these concerns. - [The]DaveRoss 14:23, 2 February 2011 (UTC)[reply]

If we can get a high consensus on one or more sets language-flag mappings that set or sets could be offered as a gadget. But I, for one, am strongly opposed to the use of the UK flag to represent English on any such set. I may have other strongly held beliefs in specific cases where some kind of implicit minimisation of minority languages or political groupings is involved. Language names themselves can be taken as offensive to some, but have the advantage of long-standing and inevitable use, which can be attested using our customary methods or by appeal to external "authorities". DCDuring TALK 12:19, 2 February 2011 (UTC)[reply]

This whole debate is why I had the foresight to write Flag Law, I don't remember where that was though. - [The]DaveRoss 03:13, 4 February 2011 (UTC)[reply]

External links are external links

The entries English, Anglo-Norman and base contain simultaneously the sections "External links" and "See also", that apparently are interchangeable. There are links to Wikipedia and to 1911 Encyclopædia Britannica in both sections.

Compare with talent, mouse and Texas, that link to encyclopedias (including Wikipedia) and other external sites using the "External links" section.

Also compare with second, nostrum and marionette, that contain external links in the "See also" section.

I may or may not be able to once more describe and rationalize an apparent practice, and try to answer why there is this discrepancy of usage of sections. But, frankly, it seems just too random. Instead, I am going directly to propose a guideline that I believe would look good, be relatively easy to implement and even easier to mantain, and, most importantly, provide consistency.

I propose always using the "External links" and never the "See also" to place external links.

Naturally, boxes such as {{wikipedia}} would be exceptions to the proposed rule, because they are supposed to fit virtually anywhere. That's it. --Daniel. 11:13, 2 February 2011 (UTC)[reply]

Interestingly, the ELE says nothing about this issue. But yeah, See also should only contain internal links, and Extermal links should have all links which lead out of English Wiktionary. -- Prince Kassad 18:54, 2 February 2011 (UTC)[reply]

Thirded.—msh210℠ (talk) 19:28, 2 February 2011 (UTC)[reply]

Curiously (given the comments above) I'd prefer everything under see also, unless there's enough content that it's better to separate them for tidiness purposes. What about {{pedia}} then? Mglovesfun (talk) 14:13, 3 February 2011 (UTC)[reply]

I have been putting {{pedia}} to See also, given that Wikipedia is semi-external to Wiktionary. If a plain majority prefers putting {{pedia}}, {{commonslite}}, etc. to "External links" rather than "See also", okay with me. --Dan Polansky 14:22, 3 February 2011 (UTC)[reply]

Mglovesfun, how much content is enough content? After pondering this subject, I came to the conclusion that I prefer never placing external links (including {{pedia}}) under see also. One reason for my preferrence, as I stated, is consistency, which is good by itself. Another reason is that "External links" clarifies the limits between Wiktionary and other websites. Compare with how "See also" literally implies "You, Wiktionary user, who came to see a definition, perhaps an etymology, derived terms, inflections and maybe more linguistic information, take your time to admire this encyclopedical article, or list of images, or this additional dictionary, too." We don't want to send this message. Do we? --Daniel. 14:31, 3 February 2011 (UTC)[reply]

Hyper-verbs

So, is "Hyper-verbs" an acceptable header? Does it mean something? See the current revision of punch. --Daniel. 16:50, 2 February 2011 (UTC)[reply]

Presumably he meant hyperonyms. I've changed it now. —Ruakh_TALK 18:05, 2 February 2011 (UTC)[reply]

The usual Wiktionary heading is "hypernym". See also WT:ELE#Further semantic relations. --Dan Polansky 18:15, 2 February 2011 (UTC)[reply]

Wiktionary talk:Etymology

In trying to proofread the page a bit, I've come across two issues, which I've put on the talk page (bottom two, as of this date and time). Mglovesfun (talk) 14:11, 3 February 2011 (UTC)[reply]

Linking to Commons.

The File:Compaq keyboard and mouse cropped.jpg contains an automatic message:

This file is from Wikimedia Commons and may be used by other projects. The description on its file description page there is shown below.

All images linked from Commons to Wiktionary share that text. I personally consider it a little hard to spot. I proposed deleting that bland message and replacing it with the box from w:File:Compaq keyboard and mouse cropped.jpg.

Thoughts? --Daniel. 16:22, 3 February 2011 (UTC)[reply]

Sounds like a good idea. I've copied over the message. --Yair rand (talk) 23:00, 3 February 2011 (UTC)[reply]

Thanks. --Daniel. 09:49, 4 February 2011 (UTC)[reply]

Sense IDs

Previous discussions: Wiktionary:Grease_pit/2010/June#Sense_referentials_and_links, Wiktionary:Grease_pit/2010/July#Stable_identifiers_for_meanings

Wiktionary has the problem of not being able to refer to specific definitions in links, which could be fixed by adding anchors containing glosses to individual definitions. The template {{senseid}} could work for this, if there was a simple way to add glosses to links via existing link templates. I propose that {{senseid}} be allowed for general use in the mainspace, but not be bot-added to all entries yet, and that {{l}} be changed to accept the id= parameter to link to definitions with glosses ({{l|en|peach|id=fruit}} would link here). --Yair rand (talk) 02:24, 4 February 2011 (UTC)[reply]

I am unconvinced that this is the best, or even a good solution, but I need to think about it. So I am going to think about it and come back here and see all of the good reasons why my thoughts are dumb lined up for me. Get to it! - [The]DaveRoss 03:11, 4 February 2011 (UTC)[reply]

It doesn't 'solve' the problem, but templates such as {{context}} and {{gloss}} could contain anchors. This would only work for senses using these glosses, of course, and the same gloss may appear more than once in an entry. Mglovesfun (talk) 00:13, 5 February 2011 (UTC)[reply]

Also this would make {{gloss}} better than just writing something inside brackets which weirdly, is all the template does right now. No clever span stuff. — This unsigned comment was added by Mglovesfun (talk • contribs) at 5 February 2011.

Adding anchors to {{context}} couldn't really work, as there are lots of times multiple senses of a word that contain the same context tag. I don't really see how adding anchors to {{gloss}} would really be helpful either. We need to have some way of connecting senses, and if no one has any better way, I don't see why not to use the {{senseid}} template. --Yair rand (talk) 06:09, 7 February 2011 (UTC)[reply]

I do not know if this is relevant or useful, but Icelandic entries like falla#Icelandic anchor synonyms to senses. - -sche 02:54, 9 February 2011 (UTC)[reply]

Poorly attested languages

The obvious solution - for me anyway - is instead of listing them on individual language consideration pages (such as WT:About English) would be to have a CFI subpage on attestation. Like I say, I favor the use of subpages to 'declutter' the CFI page, so that it contains only criteria for inclusion, not discussion about those criteria.

Anyway, something like Wiktionary:Criteria for inclusion/Attestation should do it. And something like:

"The following are considered exceptions to the 'three durably archived citations' rule as they are poorly attested"

Then stuff like

Ancient Greek: 1
Old English: 2
Old French: 2

clearly it can only be one or two; not zero, and three is the norm. Mglovesfun (talk) 10:58, 4 February 2011 (UTC)[reply]

But how is one going to add to this list? Via a series of VOTEs? I would favor a general exception for ancient languages. (probably obscure ones as well but it's hard to define that) -- Prince Kassad 11:12, 4 February 2011 (UTC)[reply]

Why not just consider all the works in languages with only a small amount available to be "well-known works"? --Yair rand (talk) 20:59, 4 February 2011 (UTC)[reply]

Because it twists the meaning of the phrase, and still doesn't define "a small amount". Does "The Flag of My Country. Shikéyah Bidah Na'at'a'í: Navajo New World Readers 2" really count as a well-known work by any standard? I would be generous as to "well-known works" with Navaho, but not that generous.--Prosfilaes 21:12, 4 February 2011 (UTC)[reply]

That doesn't strike me as a practical solution. I'm with Kassad; a general exception for ancient languages would be better, though we probably don't want to accept modern translations, like Ancient Greek Harry Potter. We could define obscure languages; say if the Ethnologue gives them less than a million speakers, ask for 2, less than 100,000, ask for 1. That doesn't achieve everything; Oromo (17.3 million speakers) is probably a lot hard to cite than Estonian (1.0 million speakers). But it is a definition that will catch all the American and Australian languages.--Prosfilaes 21:12, 4 February 2011 (UTC)[reply]

I'm not sure that this would be controversial, or even 'interesting' enough for editors to disagree over it. Might not take as much finagling as you might think. Re number of speakers, not the best criterion as speakers and written language are independent. Middle French is very well attested because it's relatively recent (post 1400) but has zero speakers, since it's 'become' Modern French. Mglovesfun (talk) 00:09, 5 February 2011 (UTC)[reply]

I don't know what the difference between 1, 2 and 3 should be, and if we can set up rules for that, why do we need to discuss each and every one? Number of speakers isn't perfect, but I wasn't suggesting that it be used for dead languages. The only non-dead language that stands out as being more easily attestable then that rule would imply is Yiddish, and given the lack of good Yiddish OCR and of Yiddish readers, asking for only 2 attestations wouldn't be a big deal. Otherwise, it improves the condition of many, many languages dramatically. (I also wasn't planning on it being used for artificial languages, which need their own rules here.)--Prosfilaes 00:44, 5 February 2011 (UTC)[reply]

Dawnraybot and pronunciations

Some of the pronunciations mentioned by Dawnraybot are incorrect, at least in scruterais (now fixed) and scruterait (to be fixed), the only ones I checked. There are probably many more pronunciations to be changed. Lmaltier 10:24, 5 February 2011 (UTC)[reply]

There are thousands of errors. Good luck finding and changing them all. --Plowman 11:01, 5 February 2011 (UTC)[reply]

Which specific forms have errors? If we can't be specific about the forms we should bot-remove all pronunciations from Dawnraybot. Nadando 00:19, 6 February 2011 (UTC)[reply]

I've been finding errors from Dawnraybot all over the place for a while. No specific forms have errors, as often the pronunciation of the stem is wrong, [1] [2] [3] but often it's the ones that end in /ɛ/ that are shown with /e/, understandable because some people pronounce them that way. —Internoob (Disc•Cont) 03:40, 6 February 2011 (UTC)[reply]

Yes. For example, the forms of peinturlurer have wrong SAMPA (using the dollar sign). And the forms of récurer have wrong IPA as well. -- Prince Kassad 18:18, 6 February 2011 (UTC)[reply]

If Plowman is right with his statement above — and for wonderfully obvious reasons we may probably assume that this is the case — then Nadando's suggestion is probably the easiest solution, even though a number of correct pronunciations might be deleted as well. -- Gauss 18:44, 6 February 2011 (UTC)[reply]

I asked a native speaker about the final /e/ vs. /ɛ/. She said they are different but for most people you won't hear the difference (that is, they pronounce them the same). So that's so minor I wouldn't worry about it, that is, listing pronunciations which do exist but aren't the ones listed in dictionaries. The harder part, like Internoob says, is just tracking down the ones that are totally wrong. Mglovesfun (talk) 20:50, 6 February 2011 (UTC)[reply]

In case of any errors, I apologize: they were not intentionally incorrect. --Plowman 20:55, 6 February 2011 (UTC)[reply]

Customizing TOC

Does anyone know how to customize the appearance of the table of contents of an entry? I would like to know the following:

1. How do I make only language names visible using CSS, while hiding "Noun", "Synonyms" and other deeper headings from the TOC?
2. How do I replace the numbered lists with bulleted lists using CSS?
3. How do I hide the list mark altogether?

Thanks for any input. --Dan Polansky 13:51, 6 February 2011 (UTC)[reply]

#1: Add table#toc ul ul { display: none; } to your CSS.

#3: Add table#toc span.tocnumber { display: none; } to your CSS.

#2: This is a bit trickier. The version that seems to fit best with the Vector scheme is

table#toc ul
{
  list-style-type: square;
  list-style-image: url(http://bits.wikimedia.org/skins-1.5/vector/images/bullet-icon.png?1); 
  margin-left: 1.5em;
}

(plus the CSS for #3), but you seem to be using Monobook?

—Ruakh_TALK 14:32, 6 February 2011 (UTC)[reply]

Thanks! It works well even with Monobook. --Dan Polansky 14:39, 6 February 2011 (UTC)[reply]

A Monobook appearance can be obtained by using Monobook bullet, like this:

table#toc ul {
  list-style-type: square;
  list-style-image: url(http://bits.wikimedia.org/skins-1.5/monobook/bullet.gif?1);
  margin-left: 1.5em;
}

--Dan Polansky 08:13, 8 February 2011 (UTC)[reply]

Re #1 (table#toc ul ul { display: none; } ): I have tried it and it has at least one downside: it also applies to TOC in Beer parlour, so it hides headings of the discussions. --Dan Polansky 14:54, 11 February 2011 (UTC)[reply]

Try .ns-0 table#toc, etc.—msh210℠ (talk) 15:55, 11 February 2011 (UTC)[reply]

Works nice; thanks. Now I would like to do something even more fancy: how do I format the list in TOC to make it a horizontal list instead of vertical, like "English · French · Spanish"? --Dan Polansky 13:31, 13 February 2011 (UTC)[reply]

What browser do you have? In IE8, and in common non-IE browsers, you can do

.ns-0 table#toc ul ul { display: none; }
.ns-0 table#toc span.tocnumber { display: none; }
.ns-0 table#toc li { display: inline; }
.ns-0 table#toc li + li:before { content: ' · '; }

but that won't work in IE6 or IE7.
—Ruakh_TALK 15:07, 13 February 2011 (UTC)[reply]

I have Firefox 3.6.13, so this works great. I have added .ns-0 div#toctitle { display: none; } to hide the title of the TOC, thereby removing one more vertical element from the box. The result is extremely compact, typically taking only one line. Thanks again. --Dan Polansky 11:16, 14 February 2011 (UTC)[reply]

Poll: Etymology and the use of less-than symbol

I would like to ask you about your preference in using "<" vs "from" in etymologies in Wiktionary. Etymology sections in Wiktionary are not united in the use of "<" vs "from".

An example of the two different formats:

From French acide < Latin acidus (“sour, acid”) < aceō (“I am sour”).
From French acide, from Latin acidus (“sour, acid”), from aceō (“I am sour”).

A longer example of the two formats:

Middle English chaufen (“to warm”) < Old French chaufer (modern French chauffer) < Latin calefacere, calfacere (“to make warm”) < calere (“to be warm”) + facere (“to make”). See caldron.
Middle English chaufen (“to warm”), from Old French chaufer (modern French chauffer), from Latin calefacere, calfacere (“to make warm”), from calere (“to be warm”) + facere (“to make”). See (deprecated template usage) caldron.

This poll disregards whether the etymology should start with "from" or omit "from" from the start (or "<", respectively). The preference of "<" is compatible with the format "From A < B < C < D"; the only thing in question are the second, third, and later word or symbol.

I tend to prefer "<", but am okay with "from" if this is the majority preference. For me, the use of "<" makes it easier to scan the string of items using my eyes and locate the individual items, while "from" gets more easily lost in the jumble. One argument against the use of "<" is that its meaning is much less obvious than the meaning of "from". But I think the meaning of "<" can be quickly picked up by the user of the dictionary. Century 1911 and Encarta[4] use "<", while some other dictionaries including Merriam and Webster online[5] use "from".

This poll combines discussion with a clear indication of one's current preference, a preference that can change later as a result of discussion. Feel free to make other proposals and comments alongside your indication of your preference.

Thank you for your attention and your input! --Dan Polansky 08:42, 7 February 2011 (UTC)[reply]

Preference 1

I prefer the use of "<" over the use of "from".

Support Mglovesfun (talk) 12:19, 7 February 2011 (UTC). And the reason is, otherwise you repeat 'from' a lot which I find irritating. I don't feel that strongly about it, I don't mind being outvoted. Mglovesfun (talk) 12:19, 7 February 2011 (UTC)[reply]
Support Dan Polansky 14:14, 7 February 2011 (UTC) As I said, for me, the use of "<" makes it easier to scan the string of items using my eyes and locate the individual items, while "from" gets more easily lost in the jumble. I am okay with going by the option that is preferred by a plain majority. --Dan Polansky 14:14, 7 February 2011 (UTC)[reply]
Support We can do fancy things if we use a <, we can do even more fancy things if etymologies were templatized. We can do nothing fancy with a from. Also I think it is a much simpler, easier to read solution. - [The]DaveRoss 05:04, 9 February 2011 (UTC)[reply]
We can do fancy things if etymologies are templatified. For example, if, instead of (at [[sully]])
From {{etyl|fro|en}} {{term|lang=fro|souillier}} (> {{etyl|fr|-}} {{term|souiller|lang=fr}}). Compare {{term|soil|lang=en}}.

we had
{{from|lang=fro|souillier}} {{whence|etyl=fro|lang=fr|souiller}} {{more at|soil|lang=en}}

and instead of (at [[wend]])
{{etyl|enm|en}} {{term|lang=enm|wenden}} from {{etyl|ang|en}} {{term|lang=ang|wendan||to turn, go}}, causative of {{term|windan|||to wind|lang=ang}}. Akin to {{etyl|ofs|-}} {{term|lang=ofs|wenda}}, {{etyl|osx|-}} {{term|lang=osx|wendian}}, {{etyl|non|-}} {{term|venda||to wend, to turn|lang=non}} ({{etyl|da|-}} {{term|vende|lang=da}}), {{etyl|de|-}} {{term|wenden||to turn|lang=de}} and {{etyl|got|-}} {{term|𐍅𐌰𐌽𐌳𐌾𐌰𐌽|sc=Goth|tr=wandjan|lang=got}}.

we had
{{from|lang=enm|wenden}} {{from|lang=ang|wendan||to turn, go}} {{etymon form of|form=causative|windan||to wind|lang=ang}} {{cognate|lang=ofs|wenda}} {{cognate|lang=osx|wendian}} {{cognate|non|venda||to wend, to turn}} {{whence|etyl=non|vende|lang=da}} {{cognate|wenden||to turn|lang=de}} {{cognate|𐍅𐌰𐌽𐌳𐌾𐌰𐌽|sc=Goth|tr=wandjan|lang=got}}
that'd be both machine-readable and (with some work) human-readable as opposed to the current system which is only the latter. (However, I still think that the human-readable display should include "from" rather than "<". :-) ) But I don't know what fancy things we can do with "<" (untemplatified) that we can't do with "from".—msh210℠ (talk) 16:34, 9 February 2011 (UTC)[reply]
- This is a really cool idea. Perhaps you should create these templates and run up a few examples...? Ƿidsiþ 09:40, 15 February 2011 (UTC)[reply]
We can't easily parse from since it is a word which may appear in an etymology for other reasons. - [The]DaveRoss 20:24, 9 February 2011 (UTC)[reply]
- Making that human-readable would be really difficult. Etymology editing should eventually be done through a WT:EDIT module (hopefully one that also updates the necessary Derived terms/Descendants sections of other entries), rather than editing the wikitext itself, so it doesn't matter all that much if the wikitext is more complicated, so long as the wikitext is machine readable/editable and the output is human readable. --Yair rand (talk) 11:12, 22 February 2011 (UTC)[reply]
Support Maro 19:21, 11 February 2011 (UTC)[reply]
Support The uſer hight Bogorm converſation 11:11, 14 February 2011 (UTC)[reply]
Support Ivan Štambuk 16:47, 14 February 2011 (UTC)[reply]
Support —Stephen ^(Talk) 23:15, 14 February 2011 (UTC) I don’t know about parsing difficulties or even why parsing should be needed, but IMO the use of < makes an etymology much more readible.[reply]
Support — Stevey7788 18:06, 17 February 2011 (UTC) Concise and elegant, and most people would quickly figure out what this means.[reply]

Preference 2

I prefer the use of "from" over the use of "<".

I prefer "from", but sometimes "which is from" or the like, whichever fits best in the paragraph, over "<". But I don't mind "<" terribly.—msh210℠ (talk) 09:46, 7 February 2011 (UTC)[reply]
Support Ƿidsiþ 09:55, 7 February 2011 (UTC), in general.[reply]
Support H. (talk) 11:52, 7 February 2011 (UTC) Yes please, this has been bothering me for a while, Wiktionary is not paper, remember? Furthermore, I never understood why to use < but not >. Furthermore, please begin with ‘From’ as well, in the same line of thinking: it is a sentence, not a telegraph message.[reply]
You are entitled to your preference, but the use of "<" has not much to do with whether Wiktionary is paper. You seem to imply that the people who prefer "<" do so to make the etymology shorter, but that is not necessarily the case, not with me anyway. For me, it is all about the ease of visual parsing. --Dan Polansky 12:16, 7 February 2011 (UTC)[reply]

I always thought that it's < because it looks like an arrow pointing from oldest to newest, showing the direction of progression. —Internoob (Disc•Cont) 03:00, 9 February 2011 (UTC)[reply]

On that subject of Hamaryns', we should use "from the Language (deprecated template usage) term": "from the English (deprecated template usage) water" sounds like it's talking about the term, whereas "from English (deprecated template usage) water" sounds like it's talking about the liquid. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 11:05, 14 February 2011 (UTC)[reply]
Support —CodeCa t 11:58, 7 February 2011 (UTC)[reply]
Support It's not a huge thing for me, but from is more obvious and less jargony.--Prosfilaes 20:49, 7 February 2011 (UTC)[reply]
Support the less-than sign is really confusing to most people. It's a far better idea to write it out. -- Prince Kassad 20:53, 7 February 2011 (UTC)[reply]
Support — is the principle that Wiktionary is not paper applicable here? It has the space to spell words out rather than to use unclear abbreviations. - -sche 05:14, 8 February 2011 (UTC)[reply]
Support Daniel. 15:16, 8 February 2011 (UTC) I personally feel more comfortable with "from", but I don't mind if people choose any of these possibilities. --Daniel. 15:16, 8 February 2011 (UTC)[reply]
Support —Ruakh_TALK 16:08, 8 February 2011 (UTC)[reply]
Support DCDuring TALK 16:23, 8 February 2011 (UTC)[reply]
Support — I prefer "from" to "<", but msh210's proposal in #Preference 1 above would be much better. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 11:05, 14 February 2011 (UTC)[reply]
Support – for reasons mentioned above, notably more understandable by casual users (who are already a bit confused by etymologies) and I find “from” easier to read. —Nils von Barth (nbarth) (talk) 11:18, 14 February 2011 (UTC)[reply]
Support Equinox ◑ 11:20, 14 February 2011 (UTC) But I'm not really picky; both seem all right. Equinox ◑ 11:20, 14 February 2011 (UTC)[reply]
Support Bequw → τ 13:41, 14 February 2011 (UTC)[reply]
Support, we have no need for space-saving devices here. bd2412 T 16:40, 14 February 2011 (UTC)[reply]
Support (more explicit). Lmaltier 17:49, 14 February 2011 (UTC)[reply]
Support, because it is easy to understand Kinamand 07:14, 15 February 2011 (UTC)[reply]
Support ---> Tooironic 07:32, 15 February 2011 (UTC) Me too.[reply]
Support --Makaokalani 08:44, 15 February 2011 (UTC)[reply]
Support Eclecticology 09:51, 16 February 2011 (UTC) Wiktionary is not paper. Older dictionariies that used "<" did it to conserve space as much as anything else. They also had conveniently and predictably located glossaries of the abbreviations and symbols that they used.[reply]
Support Neskaya … gawonisgv? 18:54, 16 February 2011 (UTC) I dislike the less than symbol, in general. Additionally I would like to bring to attention the fact that from is far clearer than the less than symbol for users of screen readers or other accessible technology, such as a Braille display. In fact I would say that at this point I strongly oppose usage of the less than symbol, due to the accessibility concerns. A screen reader would read out the etymology section with 'less than' as the representation of the symbol, which makes utterly no sense in terms of etymology. Less than language name? Seriously? --18:54, 16 February 2011 (UTC)[reply]
I'd like this to be customisable, with a very slight preference for "from" as the default. Thryduulf (talk) 15:43, 30 March 2011 (UTC)[reply]

Preference 3

I am indifferent or indecisive about the use of "from" vs the use of "<".

Support Vahag 13:57, 7 February 2011 (UTC) I use "from" in one- and two-member etymologies, but "<" in longer ones. Long chains with < are easier to read. --Vahag 13:57, 7 February 2011 (UTC)[reply]
Support —Internoob (Disc•Cont) 03:01, 9 February 2011 (UTC) Per above.[reply]
I'm not really a fan of either method. In my opinion, a template would be better. --Yair rand (talk) 05:26, 9 February 2011 (UTC)[reply]
Isn't that orthogonal to the question under discussion?--Prosfilaes 19:49, 9 February 2011 (UTC)[reply]
Not really. If "from" and "<" were identical in practice, and we had templates to display them, then any user would be able to see what he wants to see. (For example, I may see "from" and you may see "<".) --Daniel. 16:10, 10 February 2011 (UTC)[reply]
There would necessarily be a default view, though, so this discussion would not be moot.—msh210℠ (talk) 06:41, 11 February 2011 (UTC)[reply]
Support SemperBlotto 11:27, 14 February 2011 (UTC) and I think that people who add etymologies that state that a word came from language1 via language2 via language3 should be prepared to supply evidence. SemperBlotto 11:27, 14 February 2011 (UTC)[reply]
Support Conrad.Irwin 20:01, 14 February 2011 (UTC). I am a fan of standardising on one or the other; but I don't really mind which is chosen. Conrad.Irwin 20:01, 14 February 2011 (UTC)[reply]
Support Whatever becomes the standard, I'll accept, no preference. --Anatoli 22:20, 14 February 2011 (UTC)[reply]
Support Whatever becomes the standard, I'll accept, no preference. --Jcwf 00:45, 15 February 2011 (UTC)[reply]
Support Jamesjiao → ^{T ◊ C} 06:38, 15 February 2011 (UTC) This depends on the situation. Overall, I don't have any preference. So will see how this poll goes.[reply]
Support -Atelaes λάλει ἐμοί 10:57, 20 February 2011 (UTC) I think '<' does make it slightly easier to parse at a glance (but not enough, so we really need to think of something more), while 'from' is more intuitive and I think, a tad more professional. Using a hodge-podge of the two is confusing and ugly, and so I support a standard, whatever that standard may be.[reply]
Again, I bring up the accessibility concern of '<', that while it may be obvious enough to sighted users, anyone who looks at wiktionary with a screen reader will not be easily able to parse it. --Neskaya … gawonisgv? 18:13, 21 February 2011 (UTC)[reply]

Discussion

A few thoughts:

While > (greater than) is elegant (terse and symmetric with <), it can be rather confusing; I’m not sure how much it is used.
Various alternative presentations of etymologies (and cognates and the like) are possible and could be interesting – one can imagine giving etymologies as lists (rather than prose, which makes scanning easiest), timelines, graphs (of ancestors and cognates), time-lapse animations (spread and evolution of a word across time, space, and other languages), etc., etc. – though for now these are science fiction (other than timelines of usage).
More formal templates, as msh210 discusses above, would help with making etymologies more computer-parsable – currently they are mostly pretty formulaically {{etyl}} + {{term}}, with the main variation being “from” vs. “<”, as Dan is discussing here.

—Nils von Barth (nbarth) (talk) 11:32, 14 February 2011 (UTC)[reply]

- I would like to point out the fact that the less than/greater than symbols are NOT screen reader/braille display accessible, and could be very potentially confusing to anyone using such technology. The intended meaning as far as etymological context is not clear at first glance, let alone at hearing 'less than'. --Neskaya … gawonisgv? 18:56, 16 February 2011 (UTC)[reply]
What are the effects on machine-parsing en.WT content? how might this affect the dbpedia-folx efforts to build ontology parsing of en.WT content? are there similar or developing standards across Wiktionary languages? What specific problem does this address? just some of my thoughts. - Amgine/^talk 23:01, 14 February 2011 (UTC)[reply]

Wiktionary:"/Templates

I created that and it was speedy deleted, without asking me. Well, it is not used as a redirect, but as most shortcut pages, they are to be used in the search box. I regularly want to lookup one of those templates, and it is easier to type ‘WT:"/Templates’ then ‘Wiktionary:Quotations/Templates’, let alone remember the name of the latter page. So why not keep it? H. (talk) 11:14, 7 February 2011 (UTC)[reply]

Because we don't allow anything to redirect to anything. And like you say, it's your personal redirect. Another admin may disagree with me. Have you considered just adding the link from your user page. Oh having said that WT:" already exists. Sigh, shoot (that's a euphemism). Mglovesfun (talk) 12:29, 7 February 2011 (UTC)[reply]

Definitely agree with Mg, Wiktionary namespace shouldn't have shortcut redirects. If you want to redirect to something in the Wiktionary namespace, a WT namespace page should be created for the purpose, maybe WT:QT for Quotations/Templates (which seems like a badly titled page to me). - [The]DaveRoss 14:43, 7 February 2011 (UTC)[reply]

I'm not saying that no redirects should exist, just that you want to avoid ambiguity. For WT:QUOTE to redirect to WT:Quotations seems fine, as it's obvious what you're linking to. Having said that, we use a lot of initialisms which are only obvious once you click on the page. Still, that's not a reason to allow anything that someone cares to type in and hit enter. Mglovesfun (talk) 14:47, 7 February 2011 (UTC)[reply]

But [[WT:"]] seems pretty obvious, and this is an obvious extension of it.—msh210℠ (talk) 21:12, 7 February 2011 (UTC)[reply]

Wiktionary: and WT: namespaces coincide now, TDR. (See, e.g., [[WT:Criteria for inclusion]] and [[Wiktionary:CFI]].) If you mean that onle short titles should redirect, well, there are loads and loads of redirects within the Wiktionary: namespace with long titles. Also some with mixed long and short titles (which I mention in case that's your objection), like [[WT:Editable ELE]]. Undelete or, if restored, keep.—msh210℠ (talk) 21:12, 7 February 2011 (UTC)[reply]

Oh, I had forgotten that we made WT an alias instead of just a pseudo-namespace. - [The]DaveRoss 21:50, 7 February 2011 (UTC)[reply]

Wiktionary:Requests for moves, mergers and splits

Badly needs input. TBH I'm just gonna grant some of these as unopposed (1-0) unless someone objects. Mglovesfun (talk) 11:30, 8 February 2011 (UTC)[reply]

Go4it! SemperBlotto 15:13, 8 February 2011 (UTC)[reply]

Appendix-only constructed languages

Subcats of Category:Appendix-only constructed languages should not also be subcats of Category:All languages, right? I mean, Category:APL language (for example) should not show up among all the real languages in Category:All languages, right?—msh210℠ (talk) 16:09, 9 February 2011 (UTC)[reply]

I would advise deleting Category:Appendix-only constructed languages, because its title is too technical, in favor of creating "Category:Minor constructed languages" and "Category:Computer languages"...

As for your question, yes, APL should be a member of Category:All languages. Why not? It's a constructed language too, per constructed language. --Daniel. 16:26, 9 February 2011 (UTC)[reply]

Re "why not": Because Category:All languages is not a topical category where we can debate whether something "is a constructed language too" and belongs in it. It's a lexical category, and we've decided to exclude APL (and everything else in Category:Appendix-only constructed languages) from the lexicon.—msh210℠ (talk) 16:38, 9 February 2011 (UTC)[reply]

I think having Category:Appendix-only constructed languages in Category:All languages is sufficient. Mglovesfun (talk) 16:41, 9 February 2011 (UTC)[reply]

Having all languages in Category:All languages makes them easier to be found. If I want to find Category:Klingon language, or simply want to know whether we have a category for Klingon, I would like to have the possibility of browsing through the "K" part of Category:All languages. If any language is deliberately excluded from Category:All languages, I would at least expect this fact to be announced somewhere, like "This category does not contain certain minor or computer languages, that may be found in this other category." --Daniel. 16:50, 9 February 2011 (UTC)[reply]

Another 2p in favor of including, well, all languages under Category:All languages. I'm actually a bit surprised this is even being discussed. The category name is, after all, all languages, and the category page states quite clearly that This category contains, indeed, all languages, or rather, all language names (in English). -- Bemused, Eiríkr Útlendi | Tala við mig 19:50, 9 February 2011 (UTC)[reply]

I agree with User:Eirikr and User:Daniel. — it is counter-intuitive to include languages only somewhere other than Category:All languages. - -sche 08:31, 10 February 2011 (UTC)[reply]

Above all, most of the categories from Category:Appendix-only constructed languages should be deleted, per Wiktionary:Votes/pl-2010-10/Disallowing certain appendices. For instance, Category:Klingon language contains Appendix:Klingon/ghommey, which should not exist per the vote. After the subpages are deleted, the category for Klingon gets pointless. --Dan Polansky 09:14, 10 February 2011 (UTC)[reply]

I was under the impression that that vote was about fictional universe appendices, not fictional language appendices. --Yair rand (talk) 09:26, 10 February 2011 (UTC)[reply]

Oops, you are right. So maybe we should have another vote that extends the treatment also to appendix-only languages. --Dan Polansky 12:17, 10 February 2011 (UTC)[reply]

The treatment of fictional universe appendices is still relatively unclear. However, I can safely assume that, if appendix-only language appendices eventually should follow the rule of being defined in lists and never in individual entries, them the following facts about Category:Klingon language should be taken into consideration:

We would still eventually have many lists of Klingon words, possibly alphabetically (Appendix:Klingon/A, Appendix:Klingon/B, Appendix:Klingon/C...), per part-of-speech (Appendix:Klingon/List of nouns, Appendix:Klingon/List of adjectives), and/or per subject (Appendix:Klingon/List of animals, Appendix:Klingon/List of clothing), thus justifying the existence of Category:Klingon language.
We would still eventually have appendices for certain pieces of information that are common for other languages as well, such as Appendix:Klingon Swadesh list and Appendix:Klingon given names, thus justifying the existence of Category:Klingon language.
We would still perhaps have Klingon templates (for example, to display multiple scripts), thus justifying the existence of Category:Klingon templates and Category:Klingon language.
Not to mention Category:Klingon derivations and Wiktionary:Requested entries (Klingon).
Category:Klingon language fits a intuitive and organized category tree of "all languages", and displays, or should display, relevant information such as script, family, a link to an entry, links to policies, subcategories, a link to a Wikipedia article and a useful warning about Klingon being forbidden in entries, so it is justified.

That's it. --Daniel. 14:00, 10 February 2011 (UTC)[reply]

I think the problem is that Category:All languages is misnamed: it doesn't contain entries like English and French, as its name implies, but rather categories like Category:English language and Category:French language. A more accurate name might be "Category:All entries by language", or "Category:All language categories". —Ruakh_TALK 23:41, 10 February 2011 (UTC)[reply]

In my opinion, "Category:All languages" is not a bad name, but for the sake of clarity...

I most certainly will perpetually oppose the name "Category:All entries by language", because Category:Portuguese language, Category:English language, Category:French language, etc. contain not only entries but much more information.
I would probably support a change to "Category:All language categories" or simply "Category:Language categories".

--Daniel. 23:45, 10 February 2011 (UTC)[reply]

Category:Klingon language is in this category, albeit not directly, but via a subcategory. Mglovesfun (talk) 23:53, 10 February 2011 (UTC)[reply]

Category:Klingon language is (and always has been as far as I remember) a direct member of Category:All languages; check again. --Daniel. 23:57, 10 February 2011 (UTC)[reply]

Re: "Category:Portuguese language, Category:English language, Category:French language, etc. contain not only entries but much more information": O.K., but let's focus on one problem at a time. ;-) —Ruakh_TALK 00:06, 11 February 2011 (UTC)[reply]

@Ruakh: "Category:All entries by language" and "Category:Entries by language" sound good to me. These categories mostly contain entries; the only other thing they contain are indexes and appendixes, but these are very few compared to entries, so the misnaming is not too bad. If we want to be more accurate, we can have "Category:Content by language"--Dan Polansky 14:24, 11 February 2011 (UTC)[reply]

Language categories such as Category:English language and Category:Portuguese language also contain templates, relatively many pages of rhymes, few categories or pages of requests for attention, requested entries, etc., and citations. I appreciate accuracy, and I also appreciate your suggestion of "Category:Content by language". --Daniel. 15:47, 11 February 2011 (UTC)[reply]

I like "Content by language".—msh210℠ (talk) 15:57, 11 February 2011 (UTC)[reply]

Poll: Choosing topical categories

I would like to know the opinions of other Wiktionarians about the problem mentioned in WT:BP#How to choose topical categories:

Most entries fall into the scope of various redundant topical categories simultaneously. For example, German Shepherd may be categorized into Category:Herding dogs, Category:Dogs, Category:Canids, and so on. Should "German Shepherd" be a member of all these categories? Or, perhaps, should it simply be a member of only the narrowest one, that is, Category:Herding dogs, and all the others would be merely implied?

This poll is not about deciding categorization of topical categories, or deciding names of topical categories, because I believe these are complex and separate problems to be discussed eventually. Nonetheless, I also believe our category tree is good enough to undergo this project of becoming more consistent by actually deciding and letting editors know where they should categorize entries.

Feel free to make other proposals and comments. Thank you for your attention and your input. --Daniel. 18:04, 10 February 2011 (UTC)[reply]

Preference 1: Narrowest topical categories only

I prefer all entries as members of only the narrowest topical categories available.

If you agree 100% with this practice, or if you agree in essence with it but have ideas of some situations where it would be better to use less narrow categories, please vote for this option. Feel free to elaborate your thoughts.

Examples:

German Shepherd should be a member of Category:Herding dogs, but not of Category:Dogs, Category:Canids, Category:Mammals, Category:Vertebrates, Category:Animals or Category:Nature.
nimbostratus should be a member of Category:Clouds, but not of Category:Weather or Category:Nature.
Ragnarok should be a member of Category:Norse mythology, but not of Category:Mythology or Category:Culture.

Support Daniel. 18:04, 10 February 2011 (UTC)[reply]
I, personally, prefer the system of using only the narrowest category, because:
1. It avoids redundant superpopulated categories such as Category:Nature with thousands of terms from Category:Animals, Category:Plants and Category:Weather. Feel free to correct me, but I believe they don't have any practical use; that is, I don't remember or assume that any particular target group of Wiktionary users would need, want or appreciate the existence of superpopulated topical categories. (And these possible wide lists have the potential to be scanned by external tools anyway, if anyone bothers to mention or create such a tool.)
2. It is easier to visually scan the list of categories from an entry that avoids that redundancy. For example, the list of categories of the current revision of "dog" randomly contains various versions of Category:Canids, Category:Mammals, and Category:Animals, and I wouldn't call it the most comfortable piece of text to be read and understood.
3. It helps to organize existing categories, by keeping a balance of populated "narrow" categories, and underpopulated "wide" categories. For example, with this system, if any editor finds any name of a specific animal on Category:Animals (or Category:fr:Animals, Category:pt:Animals, etc.), he/she would automatically know that these entries need to be recategorized under Category:Insects, Category:Mollusks, and so on. --Daniel. 18:04, 10 February 2011 (UTC)[reply]
Support Mglovesfun (talk) 18:18, 10 February 2011 (UTC)[reply]
I agree that, if we are to have topical categories at all, then entries should be only in the narrowest available. Otoh, if we are to have topical categories, IMO the categories should be as broad as feasible, so that the examples given for this option are not in accord with my view. (Even the wording of the option, "I prefer all entries as members of only the narrowest topical categories available", while technically correct, is misleading in that someone might mistakenly infer that I prefer having topical categories.)—msh210℠ (talk) 18:40, 10 February 2011 (UTC)[reply]
Support —Internoob (Disc•Cont) 00:38, 11 February 2011 (UTC) I actually thought that that was (more or less) already one of those de facto policies. —Internoob (Disc•Cont) 00:38, 11 February 2011 (UTC)[reply]
Support , but you need to keep in mind that a term like 'narrow' only really works if the categories actually behave like a tree. But they don't always, some categories are nested in strange ways and there isn't really a clear definition of what's 'narrower' than something else. —CodeCa t 14:10, 11 February 2011 (UTC)[reply]
As phrased, I cannot agree with either option, but I'm certainly NOT disinterested. I have to agree that there needs to be some organisation to categories. Some are simply a mess, while some are redundant. But the great majority are useful, and so could be made to be more useful with a bit of organisation. However, I can easily think of cases where an entry might need to be at two different levels on the category tree, without implying that the lower level is therefore redundant. -- ALGRIF talk 17:02, 11 February 2011 (UTC)[reply]

Preference 2: All topical categories

I prefer all entries as members of all the topical categories available.

If you agree 100% with this practice, or if you agree in essence with it but have ideas of some situations where it would be better to disregard certain topical categories, please vote for this option. Feel free to elaborate your thoughts.

Examples:

German Shepherd should be a member of Category:Herding dogs, Category:Dogs, Category:Canids, Category:Mammals, Category:Vertebrates, Category:Animals and Category:Nature simultaneously.
nimbostratus should be a member of Category:Clouds, Category:Weather and Category:Nature simultaneously.
Ragnarok should be a member of Category:Norse mythology, Category:Mythology and Category:Culture simultaneously.

Preference 3: Indifference

I am indifferent or indecisive about the choice of topical categories.

Discussion

I oppose the two simple options offerred in this poll. The second option seems unworkable; it would lead to huge top-level categories. The first option is not too bad, just that I can imagine to want to have an entry both in a finest-level category and its supercategory. The first option is clear and simple, but not necessarily the best one. If I were asked which of the two options I prefer, I would clearly prefer option 1 over option 2, but that does not mean that I want the wording of option 1 to become a policy for Wiktionary practice. --Dan Polansky 18:30, 10 February 2011 (UTC)[reply]
I, too, oppose the two simple options offered in this poll. [[German Shepherd]] should be in Category:Dogs. I think a better approach might be to think about how many entries a topical category should have in order to be useful; Category:Dogs should have (say) the 500 most common/important/whatever dog-itive terms, most of which will also be in subcategories, but less important words should only be in subcategories. (That won't be perfect, because there might be some dog-related words that it's hard to categorize any more narrowly than that, such that they have to go into Category:Dogs no matter how unimportant they are; but I think that's the kind of logic that needs to be applied.) —Ruakh_TALK 16:28, 11 February 2011 (UTC)[reply]

Poll: Deprecation of topical categories

As of this revision, DCDuring has created an additional option on the poll above, to deprecate topical categories. This option since then has been supported by two people.

Since the deprecation of topical categories is a separate subject, and there was no space to formally oppose or just comment on this proposal in the poll above, I'm moving it to this additional poll, naturally with the additional options.

Once more, feel free to make other proposals and comments. Thank you for your attention and your input. --Daniel. 21:00, 10 February 2011 (UTC)[reply]

Option 1: Support deprecation

If you believe that topical categories should be deprecated at this time, with no further topical categories to be created after 20:37, 10 February 2011 (UTC).

Support DCDuring TALK 20:37, 10 February 2011 (UTC)[reply]
Support I have to wonder if anyone even used these. They don't seem to be worth the trouble to me. -- Prince Kassad 20:43, 10 February 2011 (UTC)[reply]
Support strongly. See my comments with the same timestamp in the next subsection.—msh210℠ (talk) 06:36, 11 February 2011 (UTC)[reply]
Support Ƿidsiþ 16:45, 11 February 2011 (UTC) I would quite like a few top-level topical cats -- but if it came to it I would rather have none than have the crazy proliferation of them that we currently have.[reply]

Option 2: Oppose deprecation

Oppose Daniel. 21:00, 10 February 2011 (UTC)[reply]
Topical categories serve the purpose of subdividing Wiktionary into various lists of words related by their subjects, including lists that would normally be found as various types of dictionary: a dictionary of medicine, of law, of technology, etc.

If I want to know terms used in psychology, I use Category:Psychology. If I want to know the "vulgarities" of English and other languages, I navigate through Category:Vulgarities. If I want to know terms used in chess, I use Category:Chess.

Appendices, Wikisaurus pages, and topical categories are valuable tools for the creation and maintenance of various lists of words by their contexts; each of these methods has their own scopes and qualities, and I personally like them all. Topical categories are not only dynamic, but easy to be populated, easy to be found at the bottom of each entry, and readers are used to them after all these years of creating and keeping topical categories. Unlike Wikisaurus, the scope of topical categories is simple and flexible enough to often allow adjectives, nouns, adverbs, etc. together, and unlike appendices, does not necessarily have additional information that may not be required, such as definitions of of each word, or very detailed explanations of usage and existence of groups of words. Topical categories also usually don't require to be often updated to keep up with entries.

Furthermore, the appearance of categories is consistent among all projects, so if I find an interwiki link from Category:Childish to fr:Catégorie:Langage enfantin and sv:Kategori:Barnspråk, I know I will be able to navigate them without having to learn much; and, if I compare the same topical category in two Wiktionaries, I may find the terms that are missing from them. In addition, it's easy to find or create tools to browse topical categories and gather specific contents (such as downloading specifically the dictionary of chess). For that reasons, I oppose their deprecation.

I almost forgot to mention that categories get praised or criticized occasionally, but regularly, at WT:Feedback, so other people are aware of them and use them too. --Daniel. 21:33, 10 February 2011 (UTC)[reply]
Terms found in a specialized dictionary (and not generally defined the same way in a general dictionary) are jargon terms, which are properly tagged with an appropriate jargon context tag like {{mathematics}} and categorized in the appropriate category like Mathematics. When I voted in support of getting rid of topical categories, it was in support of getting rid of a category that contains all mathematics-related terms: that's what I think of as a topical category. I oppose getting rid of jargon categories. Thus, the Mathematics category would remain (under my proposal) but contain only jargon terms (and the category description on the category page would indicate as much). Category:Herding dogs would presumably go (unless there's a jargon of the field of herding dogs specifically). So, Daniel, Psychology and Chess will remain categories and will contain, as you put it, "terms used in psychology" and "terms used in chess" — but not terms used outside of but about psychology or chess. Childish and Vulgarities are not topical categories at all: they're categories for registers, not topics. They'd of course be kept. (They should probably be renamed "English...", but that's a separate issue.)—msh210℠ (talk) 06:36, 11 February 2011 (UTC)[reply]
With the distinction presented by msh210 in mind, let me clarify that, in my opinion, Category:Greens to find all names of shades of green of a language, Category:Dogs to find all names of breeds of a language, etc. are equally helpful as Category:Chess, Category:Psychology, etc. and for exactly the same reasons. --Daniel. 15:43, 11 February 2011 (UTC)[reply]

I may also use topical categories to check if certain terms already exist in Wiktionary and are properly categorized. For example, by reading Category:Chess, I know that we lack many names of strategies of this game; and, by reading Category:Dogs, I know that we lack many names of breeds of dogs. --Daniel. 21:37, 10 February 2011 (UTC)[reply]

You seem to have misread my proposed option in the above straw poll. I am not sure why. A category such as Category:Vulgarities is clearly linguistic in its scope. The truly topical categories are essentially part of the overall non-linguistic, encyclopedic tendency within Wiktionary. I consider this tendency destructive of Wiktionary as a language resource and incompetently duplicative of the superior work on encyclopedic topics that is carried out at Wikipedia. It seems to me that we need more work on attestation of old and new terms and their senses and usage, something with is not likely to be covered in WP. DCDuring TALK 22:57, 10 February 2011 (UTC)[reply]
I believe I did not misread anything you wrote in this or the above thread. First of all, I've seen, more than once, people using the umbrella of "topical categories" for anything whose naming system basically consists of "language code, then colon, then label". Second, the proposal does not make a clear distinction of categories to be kept or deleted, and their reasons (though you gave some reasons now). And, third, even if I assume that your proposal never meant to attack Category:Vulgarities, my defense of the existence of that category does not contradict my opposition. Apparently you have misread yourself. :p --Daniel. 23:37, 10 February 2011 (UTC)[reply]
Oppose The proposal is crazy talk. Categories are a living part of any wiki. They are created by users when needed, and the bad ones can be deleted, given some criteria for good and bad. Banning the creation of categories is unnatural. --LA2 21:46, 10 February 2011 (UTC)[reply]
Oppose. --Yair rand (talk) 22:12, 10 February 2011 (UTC)[reply]
Oppose Mglovesfun (talk) 23:08, 10 February 2011 (UTC). I don't oppose topical categories. I get annoyed by our lack of regulation, essentially meaning that it's a free-for-all, but I wouldn't want all topical categories deleted. Mglovesfun (talk) 23:08, 10 February 2011 (UTC)[reply]
Oppose —Internoob (Disc•Cont) 00:45, 11 February 2011 (UTC) Per Daniel.[reply]
Oppose Dan Polansky 08:17, 11 February 2011 (UTC) I oppose deprecation of topical categories. For clarification of what I mean by "topical category": "vulgarities" is not a topical category; "Category:de:Latin derivations" is not a topical category; "mammals" and "vehicles" are topical categories; "geography" can be seen as a topical category, which contains "mountain", but it could be designed and regulated as a category of terms that are only used in geography, in which case "mountain" would not belong to "Category:Geography". While I do support having topical categories, I do not support any arbitrary level of their granularity: Category:Herding dogs could be too specific. --Dan Polansky 08:17, 11 February 2011 (UTC)[reply]
Weak oppose. I don't think we're doing a great job with topic categories, but they do fall in our remit as a dictionary-cum-thesaurus. If we allow ===Synonyms=== and ===Hypernyms=== and [[Wikisaurus:foo]] and so on, then clearly we're not just mapping from words to meanings, but also from meanings to words; and a topic-category system, done well, is a key component of that. —Ruakh_TALK 13:55, 11 February 2011 (UTC)[reply]
Oppose. I find categories a very useful tool. For instance, I sometimes use a topic category to fill the gaps .. leading to lots of new vocab. (as stated above). I feel that I must remind users that the lemmings also do this. But with paper there are restrictions, so they end up with all the usual limited range of stuff Weights and Measures, Clothes, Parts of the body, Cars, Garden tools, and whatever else comes into their heads. I think the categories can be really useful. Maybe a bit of culling from time to time, but with care. One contrib's idea of a bad category is almost sure to be another's useful portmanteau. How about checking with the contrib who set a category up before culling it? -- ALGRIF talk 16:51, 11 February 2011 (UTC)[reply]
Oppose. They are useful for contributors, but also for readers, especially when they want to find a word they have known, but forgotten. Lmaltier 21:22, 11 February 2011 (UTC)[reply]
However, if they have to be useful to readers, category names should be very easy to understand (not only by specialists). Names such as fr:Dogs should be changed to something such as Dogs in French. And categories such as Chaetodontidae would be useless (but a category such as Butterflyfish might be useful). Lmaltier 22:16, 11 February 2011 (UTC)[reply]
Oppose. I am of the same opinion as User:Mglovesfun and User:Ruakh. We should discuss, improve or delete single categories, but not delete all categories categorically. - -sche 21:42, 12 February 2011 (UTC)[reply]
Oppose. Mend it, don't end it. bd2412 T 16:42, 14 February 2011 (UTC)[reply]

Preference 3: Indifference

I am indifferent or indecisive about the existence or deprecation of topical categories.

Discussion

If this turns out to get enough support, we should start a formal VOTE to implement this change. -- Prince Kassad 21:14, 10 February 2011 (UTC)[reply]

What we really need is to standardize these. Right now, anyone can create a topical category, there aren't really any 'rules' or even 'guidelines'. Also, the discussion below is productive - it would be nice to move some things out of the topical realm which aren't topics in the way' biology' and 'art' are. Mglovesfun (talk) 12:04, 14 February 2011 (UTC)[reply]

Flood Flag

In the spirit of getting things done, I propose we remove the whole Flood Flag "process" for administrators. I propose that administrators be able to use the flag at their discretion, with the understanding that while it is enabled they will make only edits which are non-controversial, and they use a verbose reason when they flag themselves, e.g. "adding lots of glosses to trans sections". I don't see any reason why we need to have two people agree and wait 48 hours before someone is allowed to do some work, many requests get done before the approval process can complete, flooding RC needlessly. - [The]DaveRoss 22:02, 10 February 2011 (UTC)[reply]

Since I'm the most contrary person here, we could have a simple procedure: Check with User:DCDuring {;-)}. DCDuring TALK 23:01, 10 February 2011 (UTC)[reply]

We can compromise and say that everyone must inform DCDuring, the preferred mechanism for doing so being an informative reason in the flag assignment. - [The]DaveRoss 23:30, 10 February 2011 (UTC)[reply]

Yes totally agree, perhaps move Wiktionary:Requests for flood flag to Wiktionary:Flood flag/requests and create Wiktionary:Flood flag. Then admins could simply 'inform' others of their use of a flood flag, instead of 'requesting it'. Mglovesfun (talk) 00:15, 11 February 2011 (UTC)[reply]

@TheDaveRoss: I think we might as well keep the page, and have admins comment there when they flag themselves, just so people who care can keep the page on their watchlist. We can easily drop that requirement at some point if we decide in future that it's too onerous and not necessary. —Ruakh_TALK 16:15, 11 February 2011 (UTC)[reply]

That seems reasonable, a simple "post here before you flag yourself" is not an undue burden and will keep the usage somewhat transparent. - [The]DaveRoss 21:51, 11 February 2011 (UTC)[reply]

placenames up for deletion

It was recently decided that placename entries that don't meet our CFI should be tagged, and then deleted after a month if they haven't been sufficiently improved. May I make the following queries, assumptions, suggestions...

When CFI says "Information about grammar, such as the gender and an inflection table." then just the gender is sufficient.

Two different translations that are "not spelled identically with the English form" count as the necessary two criteria.

It takes only a few seconds to tag an entry but several minutes to improve it such that it meets CFI. I would like the initial month to be extended, and for people tagging entries to limit the number that they tag each day. Some of us who want these entries to stay also have other things to do that are more useful to our users (such as adding every Latin word in the Vulgate).

Cheers. SemperBlotto 20:01, 11 February 2011 (UTC)[reply]

AFAICT the policy is fairly clear on what entries need to meet CFI, but not what to do when they don't, since presumably the information needed for the entry to pass always exists. 30 day rule sounds good to me, and like RFV, if an entry is deleted then restored and such information is quickly added, then it will meet CFI. Mglovesfun (talk) 20:09, 11 February 2011 (UTC)[reply]

If an entry is known not to meet our criteria for inclusion, as these are, then I think it's already pretty generous to tag them for a month in the hopes that someone will edit the entry so it does meet the criteria. —Ruakh_TALK 21:11, 11 February 2011 (UTC)[reply]

If it can meet the inclusion criteria, but the appropriate content hasn't been added yet, I don't see why not to keep it there for at least a month. What's the difference between this and entries that don't have citations added yet? --Yair rand (talk) 21:14, 11 February 2011 (UTC)[reply]

An entry that doesn't have citations hasn't been demonstrated to meet the CFI: maybe it meets the CFI, maybe not. A month of waiting for citations has been deemed sufficient time to rule out the former case. By contrast, a placename entry that doesn't have linguistic information is known not to meet the CFI; it's an entry that shouldn't exist. If it was entered before the new rule was passed, then some leniency may be justified, and some time given to interested editors to bring it up to par; if not, I think speedy-deletion is the best course. (Even in the former case — this is a low bar. If we give thirty days to track down citations for obscure words, then thirty days should be way more than enough to track down the gender of a common placename!) —Ruakh_TALK 21:29, 11 February 2011 (UTC)[reply]

It's harder for English place name entries though. English has no gender. -- Prince Kassad 21:49, 11 February 2011 (UTC)[reply]

I haven't heard any hue and cry about the discrimination against English place names. Let's let sleeping dogs lie. DCDuring TALK 23:36, 11 February 2011 (UTC)[reply]

That's not how I see it. For citations, we care about whether the cites are listed in the entry, not whether they exist at all. We only have access to an extremely small amount of durably archived works, and a lot of the words that get deleted through RFV (especially those affected by WT:BRAND and such) almost certainly have the necessary cites somewhere in the world. I don't see how placenames are different. --Yair rand (talk) 21:52, 13 February 2011 (UTC)[reply]

Re: "I don't see how placenames are different": They're different in that they were explicitly voted to be different. To quote from WT:CFI: "A place name entry should initially include at least two of the following: […] " (emphasis mine). —Ruakh_TALK 12:15, 14 February 2011 (UTC)[reply]

The first is true. If a language has only gender but no inflection of proper nouns, the gender by itself is sufficient. The second isn't as far as I can tell, translations by themselves will never fulfill the criteria. -- Prince Kassad 21:24, 11 February 2011 (UTC)[reply]

I agree with PK.—msh210℠ (talk) 08:30, 13 February 2011 (UTC)[reply]

Do alternative spellings and synonyms count as information about grammar? —Internoob (Disc•Cont) 21:28, 12 February 2011 (UTC)[reply]
Also, derived terms frequently show demonyms, so I think that they should count toward information about grammar as well. —Internoob (Disc•Cont) 21:54, 12 February 2011 (UTC)[reply]

I don't think alternative spellings, synonyms, and derived terms count as "information about grammar". By the way, note that the requirement is "Information about grammar, such as the gender and an inflection table" (emphasis mine). By my reading, the "such as" is not intended to suggest that you can provide some random sampling of grammar information and call it good, but rather, to acknowledge that different languages have different types of grammar information. If place-names in a given language have gender and case inflections, then you have to supply both to fulfill this requirement. Most English place-names don't really have grammatical information worth mentioning, but some do: I think a usage note at [[Sudan]] explaining usage with and without "the" would count for this. That's just me, though. —Ruakh_TALK 03:26, 13 February 2011 (UTC)[reply]
Like Ruakh, I don't think alt.sp.s, 'nyms, and derived terms count as "information about grammar", but think a usage note about the for the Sudan should.—msh210℠ (talk) 08:30, 13 February 2011 (UTC)[reply]

Okay, but what's special about declension that's not special about a demonym, for instance? What makes a word's genitive form more valuable than a word's demonym? Some of them are not intuitive at all, like Malagasy or Filipino. And why is a translation into another language that's not spelled identically different than a synonym or alternative form that is not spelled identically in the same language? It seems a bit selective to me. —Internoob (Disc•Cont) 03:45, 14 February 2011 (UTC)[reply]

Re: "It seems a bit selective to me": Indeed. That was one of the reasons I opposed it. —Ruakh_TALK 12:18, 14 February 2011 (UTC)[reply]

Does Bridgetown count as multiple-word? —Internoob (Disc•Cont) 22:43, 12 February 2011 (UTC)[reply]

- It needs a space to count as "multiple words", and this entry is written together, so it would count as a single word. -- Prince Kassad 22:55, 12 February 2011 (UTC)[reply]

Coming in late to the debate, this proposal to exclude place names unless they satisfy artificial criteria seems utterly silly. It looks like the kind of thing designed to engage contributors in unnecessary work. These are mostly places that can easily be encountered in a person's reading. There is no credible doubt that Bridgetown exists as a city. Etymology helps us to understand a word, and having it for place names is an interesting but not necessary historical detail, and there is no guarantee that two identical place names will have the same etymology. For grammar information it may be sufficient to know that a name is invariant, and in the absence of other information that should be implicit by default. Eclecticology 10:55, 13 February 2011 (UTC)[reply]
- Yes, it's an ongoing fight against deletionists. Some of still have the hope that "all words in all languages" will eventually prevail. SemperBlotto 10:59, 13 February 2011 (UTC)[reply]

This thread seems to be a response to the activity of msh210 of adding {{placename/box}}[[Category:Place names needing additional information February]] to entries of geographic names, an activity that seems to have started on 11 February 2011. {{placename/box}} that msh210 uses to tag entries for geographic names was created by DAVilla on 10 February 2011.
I would like the tagging to stop, especially for entries that, while not currently meeting the CFI requirements, are likely to be able to meet them, such as "Barcelona". Alternatively, the tagger may, for each added tag, add the required information to at least one geographic name, thereby making a genuine contribution to the usefulness of Wiktionary for its users.
If the tagging does not stop, we may need to modify CFI from saying "A place name entry should initially include at least two of the following:" to "A place name entry should be able to include at least two of the following:". See also Wiktionary:Votes/pl-2010-05/Placenames with linguistic information 2. --Dan Polansky 12:01, 13 February 2011 (UTC)[reply]
That's {{subst:placename}}, fwiw.—msh210℠ (talk) 15:46, 14 February 2011 (UTC)[reply]

Sorry, but no. This requirement was a bad one, but its intended purpose was to prevent useless place-name entries from being added, by adding a small burden on their creators to make them useful. Shifting that burden onto people who are merely trying to implement our primary, defining policy document doesn't make sense. Eliminating the requirement would be one thing, but that sort of "hacking around it" is just the worst of all worlds. —Ruakh_TALK 15:11, 13 February 2011 (UTC)[reply]
Re shifting: Right on.—msh210℠ (talk) 15:46, 14 February 2011 (UTC)[reply]

I do not see how deleting the Dutch entry for Barcelona is doing useful work. The Dutch entry now says that (a) there is a term in Dutch "Barcelona", and that (b) it is translated into English as "Barcelona". If the policy gets modified as I propose, the person considering to tag an entry will have to ask themselves one question: is it likely that this entry can carry useful lexicographical information? In the case of Dutch "Barcelona", the answer is "yes, it is actually certain", and the person can proceed to another work. This question does not seem to put too much burden on the tagger. It is like with the attestation requirement: if a term is very likely to be attestable, no one should be tagging it for RFV; RFV should only receive terms whose attestability is questionable. --Dan Polansky 15:14, 13 February 2011 (UTC)[reply]

Please see my above comments. According to this rule, a place-name entry without enough information is like a term that has already failed RFV. I'm not saying that deletion is "useful work", just that it's implementing a policy that was approved by the community. (Personally, I'd be happy with deleting all place-names, but the community doesn't support that, so I don't.) —Ruakh_TALK 16:48, 13 February 2011 (UTC)[reply]
- I do admit that the tagging is consistent with the literal and strict reading of the current policy for geographic names. That is why I have mentioned the option of changing the policy from "A place name entry should initially include at least two of the following:" to "A place name entry should be able to include at least two of the following:". My previous post defended the usefulness of such a change against the charge that the change would place too much burden on taggers. --Dan Polansky 20:08, 13 February 2011 (UTC)[reply]

I think the above discussion has pretty much proved that the WT:Votes/pl-2010-05/Placenames with linguistic information 2 vote was a mistake. Most of the voters probably assumed that no one would go out of their way to try to eliminate hundreds of valuable placename entries as the policy allows. (I'm starting to wonder how much longer before RFD is clogged by entries killed by the bucket problem...) The policy clearly needs to be modified. --Yair rand (talk) 21:52, 13 February 2011 (UTC)[reply]

Ideally this project should include all place names. The policy amendment was clearly a terrible idea. As is typical with such bad policies there are always people for whom the literal interpretation of the policy is more important than the health of the project. In most cases the criteria in the added list could be easily met, but to insist that the originator include them does nothing but create dissension. If someone thinks these details are so important he should accept the responsibility. Better still, this excuse for a policy should be completely revoked. 07:21, 14 February 2011 (UTC) — This unsigned comment was added by Eclecticology (talk • contribs).
- I assume I should take offense at that. Note that in tagging the city names I tagged, I had no intention (as I mentioned to SB on my talkpage) of deleting the pages after a month. In fact, within the month, I was planing to go (and would still be planning to go, except that recently proposed votes may make such action not as necessary) back to each entry and add pronunciation for any I knew the pronunciation for (which is very few), and if that made the entry meet the CFI then I would detag it. (If an entry remained so tagged for a while I'd certainly delete it, though.) The category of entries so tagged is also a cleanup category. I didn't think that my tagging the entries would create such a horrified response: I thought it was the natural outcome of our new policy, which certainly had community approval: frankly, I'm surprised.—msh210℠ (talk) 15:46, 14 February 2011 (UTC)[reply]

One practice that is bound to create dissension is sticking a template and saying that the article will be deleted unless certain criteria are met. It calls up the apprehension that if it can be done on one article it can be done on any, and someday it may be about something that I care more about. Deletion without warning is a thoroughly unacceptable way of doing things, but generates a lot less noise. Natural outcomes depend on what you want to accomplish. Is it improving bad articles or deleting them? Eclecticology 00:32, 15 February 2011 (UTC)[reply]
Since I tagged them rather than deleting them without warning (whcih, as you say, might be less noisy, and which would be justified TBH), obviously I'm not solely interested in their deletion. If they can be improved so as to satisfy our criteria for inclusion, that's great. Some have been already.—msh210℠ on a public computer 17:53, 16 February 2011 (UTC)[reply]

Since no one seems to support this policy anymore, I've created a vote proposing that it be revoked: Wiktionary:Votes/pl-2011-02/Remove "Place names" section of WT:CFI. Please take a look, make any necessary improvements, etc. —Ruakh_TALK 14:19, 14 February 2011 (UTC)[reply]

The policy seems fairly good to me. It explicitly programs on (refers to; is expressed in terms of) the rationale for inclusion of geographic names: their ability to carry useful lexicographical information. It says there is a consensus that geographic names should be largely included. A minor tweak to the policy should do, so I have created an alternative vote: Wiktionary:Votes/pl-2011-02/Relaxing CFI for geographic names. Proposals for improved wording are welcome, especially here in Beer parlour, and on the talk page of the vote. --Dan Polansky 14:49, 14 February 2011 (UTC)[reply]

There is a distinction to be made between what Geographic names to include, and what is included in articles about geographic names. As long as it is about the latter it doesn't belong in CFI. Eclecticology 00:55, 15 February 2011 (UTC)[reply]

In my opinion, the best thing to do to this policy would be to make what counts as "information about grammar" more comprehensive for the reasons I stated before. Also, the "multiple-word" thing seems to exclude multiple-word place names whose etymologies aren't intuitive. Other than that, this policy is okay IMHO. —Internoob (Disc•Cont) 04:46, 15 February 2011 (UTC)[reply]

We can decide, with or without a vote, that entries that existed before the place name vote passed must be tagged longer than a month before deletion. 3 months, 6 months, a year even? Tagging them is useful to the project, and msh210 does not strike me as a person "for whom the literal interpretation of the policy is more important than the health of the project". The present Dutch entry for Barcelona is a needless stub, only repeating the information in the translation table: "Dutch: Barcelona n". The CFI can be amended to count demonyms or synonyms or two foreign translations.--Makaokalani 09:00, 15 February 2011 (UTC)[reply]

We don't generally disallow entries just because they are repetitions of what's already shown in a translation table. Why should we do that for place names? --Yair rand (talk) 09:11, 15 February 2011 (UTC)[reply]

Because a place name entry without linguistic information is encyclopedic. Because names are different from words that have a meaning. --Makaokalani 11:11, 15 February 2011 (UTC)[reply]

How would it be without linguistic information, or encyclopedic? Such entries (those that are duplicates of information already in translations sections) are exclusively linguistic information containing no information about the place itself whatsoever. An English entry for a place name only containing information about the place itself could be encyclopedic, but I don't see how a translation could be. --Yair rand (talk) 11:18, 15 February 2011 (UTC)[reply]

Saying that a place name entry without linguistic information is encyclopedic is a very narrow interpretation of the problem. There are certain features of a place name that overlap the two kinds of work. People will run into a place name in a course of ordinary reading and will ask some basic questions that do not require a detailed encyclopedic treatment, most importantly, "Where is it?" For many English names, where pronunciation is self-evident, the only other grammatical information may be that the name is invariant. Eclecticology 00:52, 16 February 2011 (UTC)[reply]

The two votes that have resulted from this thread have started:

--Dan Polansky 15:09, 22 February 2011 (UTC)[reply]

I don't think it's all that likely that either of those will pass. We need a vote allowing all attested place names, in my opinion. --Yair rand (talk) 15:53, 22 February 2011 (UTC)[reply]

Reading level or frequency for Wiktionary entries.

On the English Language and Usage - Stack Exchange Web Site the question: vocabulary - To what reading level does a specific word like 'verbose' belong? has no satisfying answers. Wiktionary.com seems like it would be the right place for this kind of information. A new section like Pronunciation could be added for Reading Level to each word's entry. Also, a Frequency section would be useful for each word's entry.

What measure would you use to evaluate such a statistic? I suppose the most obvious would be to divide the English corpus by target age level (somehow) and then look at the frequency in each sub-corpus for each word. That is not a simple task at all. - [The]DaveRoss 17:36, 12 February 2011 (UTC)[reply]

I could see how computational linguistics could generate an approximation for word readability. Generate a readability index score for a passage. (Even better, generate multiple scores and check for consistency, discarding ratings and passages that were excessively discordant, combing the remaining measures into a single score.) Use that score as a score for each word in the passage, possibly excluding high-frequency words. Repeat until one has a statistically adequate number of data points for all words for which the readability score is desired. The process could be used recursively to assign passage readability scores based on the word readability scores to the passages and thence again to the component words.

This would be a substantial project. It would still miss context sensitivity, such as careful explicit definition of an otherwise hard word.

As with most frequency analysis this does not get at specific senses or analyze by lexeme rather than spelling.

Obviously this a fairly ambitious project. DCDuring TALK 19:49, 12 February 2011 (UTC)[reply]

Thinking about this, another fun approach might be to crowdsource the whole scoring part. Set up a "survey" where people input their age and level of education, and are then presented with a series of words. Even a simple "I know this word" vs "I don't know this word" (perhaps an intermediate "I recognize but do not understand this word") response given by enough people could generate some pretty interesting data. I have seen this method used in other contexts, but I don't see why it wouldn't work for readability. Probably not a Wiktionary project but I assume that someone would be interested in what words people understand. The only measurement I am aware of for readability is the Flesch–Kincaid readability test and that doesn't work well for single words. - [The]DaveRoss 20:21, 12 February 2011 (UTC)[reply]

Poll: Categories of lexicons are "lexical categories"

The names of categories of Wiktionary normally follow a relatively clear distinction between "lexical categories" (including language names, among other characteristics) and "topical categories" (including language codes, among other characteristics). However, ironically, Category:Lexicons apparently is an exception by having a huge majority of subcategories that use the format of "topical categories", and a few exceptions that use the format of "lexical categories".

This situation leads to the awkward simultaneous existence of these category trees, among many others:

I believe it is possible to achieve logic and consistency (and, by extension, navigability) by using only one categorization system for all lexicons. For that reason, I developed this poll with an initial simple proposal and the common options to make decisions and comments about it. I expect to eventually develop this basic idea into other, more detailed, proposals, such as possibly decisions about individual categories and their individual names.

I would like to know the opinions of other Wiktionarians about it.

Thank you for your attention and your input. --Daniel. 06:40, 13 February 2011 (UTC)[reply]

Preference 1: Support

I prefer categories of lexicons with language names instead of language codes

For example, Category:pt:Vulgarities may be replaced by Category:Portuguese vulgarities, Category:Portuguese vulgar terms or other category name without "pt" but with "Portuguese".

If you agree 100% with this practice, or if you agree in essence with it, please vote for this option. Feel free to elaborate your thoughts.

Support Daniel. 06:40, 13 February 2011 (UTC)[reply]
It is consistent with Category:English nouns, Category:English symbols, Category:English phrases and Category:English suffixes, among other lexical categories. --Daniel. 06:40, 13 February 2011 (UTC)[reply]
Support. The name should include "in Portuguese" (used without in, Portuguese is ambigous, as it may refer to the country or to the language). This should apply to all categories, including topical categories : for users, language codes are meaningless. Lmaltier 12:33, 13 February 2011 (UTC)[reply]
I agree that language codes are bad. But I don't agree about "in", and I'm not sure that we should drop the idea of a naming-convention distinction between grammatical categories and topical ones. —Ruakh_TALK 13:05, 13 February 2011 (UTC)[reply]
Support Mglovesfun (talk) 21:54, 13 February 2011 (UTC) in fact I'd actually considered proposing this. Mglovesfun (talk) 21:54, 13 February 2011 (UTC)[reply]
Support I've always been very confused by these two systems. They didn't always make sense and seemed rather randomly chosen sometimes. So I'd be happy to see this confusion removed. I also think using language codes in category names is ugly, and gives preferential treatment to English. —CodeCa t 15:59, 14 February 2011 (UTC)[reply]
I support Preference 1, but I also support the opposite proposal: that all such categories use the language code. As long as we're consistent: all lexical categories should look the same.—msh210℠ (talk) 15:52, 14 February 2011 (UTC)[reply]
If we rename all lexical categories to imitate the format of "topical categories", then it would cause a conflict between Category:French parts of speech and Category:fr:Parts of speech, because both would fit the name "Category:fr:Parts of speech". --Daniel. 05:48, 15 February 2011 (UTC)[reply]
Good point. (Unless we use FR for lexical cats and fr for topical, but that's unwise IMO.)—msh210℠ (talk) 17:22, 15 February 2011 (UTC)[reply]
It's a good point, but do you really think that users understand the difference? The solution is to be make names clear (e.g. Parts of speech in French and Parts of speech categories in French) Lmaltier 20:43, 15 February 2011 (UTC)[reply]
One way to make sure users would understand the difference would be linking from fr:Parts of speech to French parts of speech and vice-versa, and explaining the difference shortly in all categories. Parts of speech in French is a good suggestion, while Parts of speech categories in French is not; the latter does not convey well the concept it tries to address, and would require further context to be clearly understood. --Daniel. 20:53, 15 February 2011 (UTC)[reply]
Support —Internoob (Disc•Cont) 04:51, 15 February 2011 (UTC) For usability reasons.[reply]
Support Yair rand (talk) 17:39, 6 March 2011 (UTC)[reply]

Preference 2: Oppose

I oppose the proposal described as the preference 1

Oppose. I think that the categories should be standardized in the other direction, for example Category:Swedish swear words becoming Category:sv:Swear words. --Yair rand (talk) 22:00, 13 February 2011 (UTC)[reply]

If we rename all lexical categories to imitate the format of "topical categories", then it would cause a conflict between Category:French parts of speech and Category:fr:Parts of speech, because both would fit the name "Category:fr:Parts of speech". --Daniel. 00:54, 15 February 2011 (UTC)[reply]

Preference 3: Abstain

I am indifferent or indecisive about the proposal described as the preference 1

Abstain Ƿidsiþ 11:14, 14 February 2011 (UTC), but they should probably be standardised in one direction or another. Ƿidsiþ 11:14, 14 February 2011 (UTC)[reply]
Abstain I don't like the phrasing of the second option ("I oppose the proposal described as the preference 1"): the preference one is not a proposal but a statement of user preference of A over B, and the alternative preference would be one of B over A rather than an opposition. I don't like the title of the poll ('Categories of lexicons are "lexical categories"'), as it does not match the text of the preferences. I would rather not touch the poll, but abstaining cannot harm I guess. I surmise that categories of lexicons (as "Vulgarities") are lexical categories rather than topical ones. I further surmise that categories of lexicons should use the naming convention of lexical categories, which is currently along the lines of "Spanish vulgarities" rather than "es:Vulgarities", unless someone gives reasons against. --Dan Polansky 17:44, 15 February 2011 (UTC)[reply]
On second thought, maybe it would actually be better to emphasize "user preference" over "proposal" as the Preference 2, for example by creating one of these alternatives:
- Preference 2: I don't prefer categories of lexicons with language names instead of language codes
- Preference 2: I prefer categories of lexicons with language codes instead of language names
One of these is a preferrence of "B over A", as you (Dan) suggested. Nonetheless, I'm happy with the existing headers, since there isn't any conflict between the concepts of proposal and preferrence, and "I oppose the proposal described as the preference 1" covers both alternatives. I see good logical choices here. Their wording and my reasoning could be simpler, though. --Daniel. 20:06, 15 February 2011 (UTC)[reply]

Discussion

There is one point that hasn't really been raised yet. What do we do with categories like Category:Agriculture? 'English Agriculture' just sounds odd... —CodeCa t 15:46, 23 February 2011 (UTC)[reply]

How is Category:Agriculture part of "Categories of lexicons"? --Yair rand (talk) 17:28, 1 March 2011 (UTC)[reply]

Is this including etymology categories (Category:fr:English derivations, etc.)? --Yair rand (talk) 17:28, 1 March 2011 (UTC)[reply]

"Cambodian" and "Khmer"

The language whose code is km is named Cambodian here and Khmer here. Shouldn't these pages use only one name to refer to that language? --Daniel. 03:52, 14 February 2011 (UTC)[reply]

In a word, yes. Mglovesfun (talk) 11:59, 14 February 2011 (UTC)[reply]

Wiktionary:Votes/pl-2010-12/Names of individuals

The vote has started. --Daniel. 07:25, 14 February 2011 (UTC)[reply]

I just looked at it and the proposal needs serious rethinking. See the talk page. Eclecticology 23:18, 15 February 2011 (UTC)[reply]

Renaming CFI section for spellings 2

Wiktionary:Votes/pl-2011-02/Renaming CFI section for spellings ends on 17 February 2011, in three days. Only 5 people have voted so far. It would be nice if some more people voted; even explicit abstains would be nice. I can imagine most people just don't care about the subject of the vote, which makes plenty of sense, as the proposal is really merely cosmetic. --Dan Polansky 12:27, 14 February 2011 (UTC)[reply]

As I said on the vote page this change is misleading. The CFI page is about what to include, and this section discusses a particular type of entry, so "Spellings" does not address the problem. A better alternative would be to simply delete "common misspellings" from the title as a redundancy. Perhaps I should add this to the vote page as an alternative. Eclecticology 21:36, 15 February 2011 (UTC)[reply]

Quotations - Proposed enhancements

TO: Wiktionarians

FROM: Geof Bard

RE: Quotations - Proposed enhancements

Feb 14, 2011

Summary: Date; Regionalize; Situate & Tag Quotes (More Better)

I propose that:

(1) Wiktionary guidelines be amended to strongly encourage dates on all quotations; without them it is impossible to "date" the term or word in question. Since language is a living and evolving creature of living persons, the lack of dates seriously impairs their value.

(2)Wiktionary guidelines be amended to strongly encourage editors/writers to distinguish the region associated with the quote, de minimis, whether we are talking Castillian or Latin American Spanish; American or British English, Southern writers who write in Southern idiom should be also identified particularly when using a word which is not in its basic entry associated with the region.

(3) A third area of concern is there is seldom adequate identification of which slang subcultures words originate from. For instance, some slang is clearly associated with, in its origination, with African American Northern urban USA. Is that not the case for instance with "homeboy"?

Other terms are affiliated with Chicago where electric blues developed its own nomenclature.

(4) Fourthly, it seems that there is need for a greater array of classification templates. At this point, I will not elaborate that issue pending greater familiarity with Wiktionary culture.

Thank you in advance for your thoughtful comments

Geof Bard

Geof Bard 02:08, 16 February 2011 (UTC)[reply]

You're very active to say you've been contributing less than 24 hours! Does it not occur to you to get a feel for how Wiktionary works before proposing this sort of thing?

Mglovesfun (talk) 11:41, 15 February 2011 (UTC)[reply]

Active is bad? Some of the bots add 100 articles per day, I don't think I have added so many.

I've been using English and other languages a bit longer than that. Didn't notice any rules about reservation of opportunity to make proposals to old guard. Isn't the point of a beer "parlor" to loosen up? See alcohol, tavern, R and R. Sometimes someone with a fresh perspective notices things you may have long gotten used to, aside from anonymous sarcasm; to wit, many of the existing quotes are not representative of the sense, except, obviously, for some specific place and time. You might find it of interest to take a look at the idea expounded in the book Zen Mind, Beginner's Mind. At any rate, the following post seems to take my points regarding new context-based classifications, etc. I am surprised you aren't more open to the specific suggestions, as per your user page:

A dictionary is about quality and quantity, so correcting the entries we already have is very important.

But based upon your response, I should wait until my initial enthusiasm fades into jaded cynicism, and then proceed. That process is receiving a boost, so maybe I will implement the fourth item, below, and create a context template for a certain lingo or two. But, alas, my initial flush of enthusiasm is no longer at my disposal; well, seventy two hours is a long time in cyber-time, so I think I'll work on something else for a few bytes.

Geof Bard 02:02, 16 February 2011 (UTC)[reply]

Your generous provision of management services is greatly appreciated.

Dating quotations: Feel free to get started on the entries marked with {{rfdate}}. There are also many entries marked with {{webster 1913}} that have the same problem.
Regional dialect ascription is usually important only if it leads to semantic, pronunciation, orthographic, or usage difference. Quotations should be associated with specific senses.
Register/subculture. Feel free to improve any entry for which you have good reason to believe you can.
A user formerly created new context-based classifications when there was a sufficient number of entries that used the {{context}} tag. Perhaps he or someone else will recommence that practice.

To make your comments more readable, try not to start any paragraph with a space. Also, it is greatly appreciated if contributors to this kind of page avail themselves of the Wikimedia software convention for generating timestamped signature by typing ~~~~. DCDuring TALK 17:12, 15 February 2011 (UTC)[reply]

RE "Your generous provision of management services is greatly appreciated."

Dropping a comment card in the comment box is not the same thing as a hostile buyout.

RE# Feel free ... many entries have the same problem.

I'll take that for a positive feedback

Register/subculture. Feel free...

I'll take that for a positive feedback

RE: A user formerly created new context-based classifications

Maybe I will feel inspired to do that...

RE: Readability...I kind of liked the layout before but you are probably more used to the standard layout so I modified it. Yeah I forgot the ~~~~ on that post but then doesn't the bot do it anyway? The problem being, I suppose, server calls = more traffic and the possibility of an edit conflict. So anyway, yeah, oh course, whatever.

Geof Bard 02:31, 16 February 2011 (UTC)[reply]

Plurals of proper nouns

Per WT:RFD#Qu'ran but also in many other entries, when proper noun can have plurals, does that make them 'common nouns', such as Coke, Pepsi, Mars Bar (etc.) I think no; proper nouns can have plurals, there's no need for a separate PoS just to accommodate countable use, be it singular or plural. Mglovesfun (talk) 11:39, 15 February 2011 (UTC)[reply]

The entry Pepsi has only one English noun sense: "A portion of Pepsi." I'd say that sense is indeed a common noun, despite existing in context of a specific entity. There is already a proper noun definition to cover a similar but different concept. --Daniel. 13:03, 15 February 2011 (UTC)[reply]

@Daniel.: Agreed. —Ruakh_TALK 13:16, 15 February 2011 (UTC)[reply]

@Mglovesfun: If you say "a ____" or "____s", then you're not really using "____" as a proper noun. But almost every proper noun can be pressed into common-noun service in this way. Sometimes this sort of use is conventionalized/lexicalized enough that it's worth including as a separate sense; sometimes it isn't. When it is, ===Noun=== is the place for it. I think (deprecated template usage) Bible and (deprecated template usage) Bible both warrant common-noun definitions; I would imagine that (deprecated template usage) Qur'an probably does as well. In the case of (deprecated template usage) Coke and (deprecated template usage) Pepsi, I'd actually be more inclined to include the common nouns (which denote the substances) than proper nouns (which denote the companies — not very common — or the brands — less common yet). In the case of (deprecated template usage) Mars Bar, similarly, I'd be more inclined to include the common noun (deprecated template usage) Mars bar. —Ruakh_TALK 13:16, 15 February 2011 (UTC)[reply]

In cases like these, proper nouns always seem to refer to specific instances of things. For example, Coke isn't a single thing, it's a type of thing that happens to have a trademarked name. The proper noun itself refers to the type of thing it is, not to that specific thing I might have in my hand. —CodeCa t 13:34, 15 February 2011 (UTC)[reply]

Yes, in a Coke, Coke is a common noun, this is clear. But in a Churchill, Churchill is not a common noun, either it's a standard use of surnames, or it's a standard use of proper nouns (figure of speech, metaphor). Lmaltier 19:10, 15 February 2011 (UTC)[reply]

I noted last year that we lack a lot of plurals for proper nouns such as given names and surnames. Things like Martins, Stevens, Eves (etc.) Mglovesfun (talk) 13:59, 17 February 2011 (UTC)[reply]

It's because of WT:Votes/pl-2008-06/Plurals from proper nouns. --Makaokalani 12:54, 19 February 2011 (UTC)[reply]

Legality of བྱང་ཆུབ་སེམས་དཔའ

The original section heading before renaming: "བྱང་ཆུབ་སེམས་དཔའ is this legal? Search that, look at what is there, discuss,if so inclined."

བྱང་ཆུབ་སེམས་དཔའ

It is Tibetan script but some people might want an English language definition. But maybe that is not "legal" here, I am the new kid on the block. Please advise.[

Also, for that matter, http://en.wiktionary.org/wiki/M%C5%ABlamadhyamakak%C4%81rik%C4%81

Another example is http://en.wiktionary.org/wiki/%E0%A4%B6%E0%A5%82%E0%A4%A8%E0%A5%8D%E0%A4%AF%E0%A4%A4%E0%A4%BE

The reason I find these useful is in part because if people use google, it is very difficult to find an English definition without finding a translator, translating, and then dealing with a cruddy "definition" of sorts. Wiktionary is more direct and can do it better with live wiki style editing.

Frankly, I would like to see longer more specific definitions result from lengthly debate, but the time for that will probably be in a few years.

Geof Bard 05:50, 16 February 2011 (UTC) Geof Bard 06:24, 16 February 2011 (UTC)[reply]

Yes, all definitions are written in English. It's generally best to 'translate' when a translation exists and explain the content at the English page. But when no direct English translation exists, lengthier definitions are the norm. Mglovesfun (talk) 06:36, 16 February 2011 (UTC)[reply]

RE: "it is best to 'translate' when a tranlation exists" not sure what you mean...

I have simplified the section heading. --Dan Polansky 07:26, 16 February 2011 (UTC)[reply]

That is, when a single word translation exists, or a phrase, basically anything that exists or could exist as an English entry, it's better to link directly to it and let the English entry explain its meaning. When there is no equivalent in English, you should try to provide a concise explanation of what the word means. Mglovesfun (talk) 13:12, 17 February 2011 (UTC)[reply]

Wiktionary:Votes/pl-2011-02/Romanian orthographic norms

Mglovesfun (talk) 12:57, 16 February 2011 (UTC)[reply]

Putting proto-languages into subpages

Currently we list the entries for proto-languages in the appendix in a 'flat' format: Appendix:Proto-Germanic *handuz. I think it would be better if we changed this to a subpage: Appendix:Proto-Germanic/handuz. This would make templates work a lot better, because the actual entry name can be easily extracted from the full name. It also fits better with how we have entries in English that can't go in mainspace. —CodeCa t 12:10, 17 February 2011 (UTC)[reply]

Support Daniel. 13:06, 17 February 2011 (UTC)[reply]
This format would be consistent with Appendix:Unsupported titles/Less than three. --Daniel. 13:06, 17 February 2011 (UTC)[reply]

I don't oppose it; {{proto}} would need to be changed, quite a minor change, though. Mglovesfun (talk) 13:50, 17 February 2011 (UTC)[reply]

Sounds good to me. —Ruakh_TALK 13:53, 17 February 2011 (UTC)[reply]
Support switching to subpage format. (...on condition that we can continue to use the color green to indicate support :) ). --Yair rand (talk) 14:58, 17 February 2011 (UTC)[reply]
Support.—msh210℠ (talk) 16:36, 17 February 2011 (UTC)[reply]

It looks like this is getting enough support to pass. Would any of you like to help me move the pages once the changes to the templates have been made? —CodeCa t 17:54, 17 February 2011 (UTC)[reply]

Pywikipediabot/movepages.py, FYI. This would be extremely easy to do. Any exceptions to watch out for? And should we keep the redirects? Nadando 19:20, 17 February 2011 (UTC)[reply]

I don't think there are any exceptions. But a few of the pages are redirects... so we'll need to move them too. I think we should have redirects from the old names to the new ones too. —CodeCa t 19:38, 17 February 2011 (UTC)[reply]

I tried moving some of the pages by hand but it became really tedious after a while. But I don't know how to use that bot, either... —CodeCa t 21:09, 17 February 2011 (UTC)[reply]

If someone else hasn't jumped on it by then I'll do it next time a dump comes out. Nadando 04:52, 18 February 2011 (UTC)[reply]

Oh sure. Why hasn't nobody thought of this before? -- Prince Kassad 17:59, 17 February 2011 (UTC)[reply]
I predicted it somewhere at Wiktionary:Beer parlour/2010/September. :p "[...] Appendix:Proto-Germanic *haglaz is not named Appendix:Proto-Germanic/*haglaz with a slash. So it's safe to say that reconstructed languages don't fit the system of constructed languages and fictional terms, and that they could continue existing separately, or they could merge in the future by means of additional proposals for a more convenient overall organization." --Daniel. 04:46, 18 February 2011 (UTC)[reply]

Poll: Sorting representative entries of topical categories

More often than not, a topical category may contain a representative entry: For example, Category:History may contain history. When this happens, people usually try to emphasize the entry in question by placing it right at the start of the list of members of the topical category. For example, the first entry listed as a member of Category:Time is "time", and the first of Category:Sex is "sex". On the other hand, the entry "biology" follows the alphabetical order, by appearing below the "b" header of Category:Biology.

For what it's worth, "history" is not a member of Category:History, "geography" is not a member of Category:Geography and "chemistry" is not a member of Category:Chemistry, but they can be categorized anytime and fit into one of the two systems described above.

As noted above, that emphasis does not occur everytime when possible. As a result, I would like to know the opinions of other Wiktionarians about how to deal with this specific inconsistent aspect of topical categories.

Thank you for your attention and your input. --Daniel. 06:18, 18 February 2011 (UTC)[reply]

Poll: Sorting representative entries of topical categories — Preference 1

When possible, I prefer representative entries organized among other entries normally.

If you agree 100% with this practice, or if you agree in essence with it, please vote for this option. Feel free to elaborate your thoughts.

Support Daniel. 06:18, 18 February 2011 (UTC)[reply]
- I support this decision, for the following reasons:
1. Emphasizing a representative entry by placing it as the first item of the category is redundant, at least in categories of English words, because the category already has (or should have) a good description linking to it. The description is where people will be most certainly always be able to find links to representative entries.
  - For example, the description of Category:Geography is: "The following is a list of terms related to geography."
2. The entries to be emphasized should not always be members of the category in question.
  - For example, day and week are not members of Category:Days of the week, so a reader who searches for representative entries among the first few items of that category will fail to find them. On the other hand, both day and week can be listed within the description.
3. There is inconsistency of where to sort the entries in question to emphasize them.
  - The entry şah is sorted into Category:ro:Chess by a space, but échecs is sorted into Category:fr:Chess by an asterisk, among countless other similar examples. (This is a minor issue that can be fixed relatively easily, regardless of the decisions of this poll.)
4. It breaks the alphabetical order.
  - For example, if I am navigating through the "W" section of Category:Weather, I would like to see weather listed there. If "weather" is not listed there, then it gives the impression that it is absent from the category (i.e., either not defined yet or just uncategorized). Contrariwise, if one believes that "weather" sould be the first item of the category, but it is actually listed below the "W" header, then the initial impression would be equally that the term is absent from the category.
  - As another example, it feels unnatural to see "psychohydraulics", "psychological refractory period", "psychologist", "psychometrician", "psychometrics" listed together, without "psychology" among them, as members of Category:Psychology.
- --Daniel. 06:18, 18 February 2011 (UTC)[reply]
Support for above reasons, but also because:
1. Making an exception to the alphabetical order adds complexity (see KISS principle).
2. Users have no reason to read a category page when they want to read the page. Categories are useful mainly when you want to find words you don't know, or words you have forgotten, or words you cannot enter with your keyboard.
- Lmaltier 22:03, 18 February 2011 (UTC)[reply]
- Just to clarify if necessary: Lmaltier's statement "Support for above reasons, but also because [...] it's more complex" for a reader without a certain level of knowing the context, seriously sounds like they're supporting a complex preference. :) While actually the other preference is complex by comparison. This is the simple one, and I like it too. --Daniel. 23:33, 18 February 2011 (UTC)[reply]
  - You are right, I clarified my sentence, but proposing a change only to oppose it is disturbing. Lmaltier 17:36, 19 February 2011 (UTC)[reply]
    - Daniel. is not proposing a change, per se; he's proposing uniformity. Currently many users do things like [[:Category:Chess|*]] or [[:Category:Chess| ]]. —Ruakh_TALK 21:10, 21 February 2011 (UTC)[reply]
I don't like topical categories, but if we're to have them then I prefer this option.—msh210℠ (talk) 18:24, 21 February 2011 (UTC)[reply]

Poll: Sorting representative entries of topical categories — Preference 2

When possible, I prefer representative entries listed as the first items of their respective topical categories.

If you agree 100% with this practice, or if you agree in essence with it, please vote for this option. Feel free to elaborate your thoughts.

Support. This provides a basic means of making sure that the title of the category maintains some connection with the ordinary meaning of the words used. This seems especially important as so much of the naming and structure of our categories is the product of a narrow base of users who are not native speakers of English. Moreover, where the name of the category does not correspond to an entry name or a specifically sanctioned combination of entry names, the category should be renamed. DCDuring TALK 19:11, 21 February 2011 (UTC)[reply]
Do you have any example of category that should be renamed because its name does not correspond to an entry name or a specifically sanctioned combination of entry names? --Daniel. 19:57, 21 February 2011 (UTC)[reply]

Or, perhaps, do you have any example of topical category that was renamed for this reason? Or an example of a topical category that never existed, but would have to be renamed for this reason? --Daniel. 08:47, 22 February 2011 (UTC)[reply]

Poll: Sorting representative entries of topical categories — Preference 3

I am indecisive or indifferent about where to sort representative entries of topical categories

Feel free to elaborate your thoughts.

I don't think topic categories should include the "representative entries" at all. --Yair rand (talk) 20:06, 21 February 2011 (UTC)[reply]
How come?—msh210℠ (talk) 20:15, 21 February 2011 (UTC)[reply]

Poll: Sorting representative entries of topical categories — Discussion

Vote: Deprecating less-than symbol in etymologies

I have created Wiktionary:Votes/pl-2011-02/Deprecating less-than symbol in etymologies, as a follow-up on the recent poll #Poll: Etymology and the use of less-than symbol, February 2011.

The vote is planned to start on 22 February 2011 and last 14 days.

The vote is supposed be a mere formality after the poll showed a supermajoritarian preference of one of the discussed options. What could theoretically be controversial about the vote is the proposed use of "From A, from B, from C" rather than "A from B from C" (comma rather than no comma). If this turns controversial before the start of the vote, I will remove this from the vote and leave it open instead. --Dan Polansky 09:03, 18 February 2011 (UTC)[reply]

Indefinitely block User:Daniel.

Discuss. Have fun y'all. Mglovesfun (talk) 22:54, 18 February 2011 (UTC)[reply]

Reason? -- Gauss 23:04, 18 February 2011 (UTC)[reply]

Disruptive edits; creating hundreds (maybe) of non-dictionary entries and categories to match. I've discussed it in private with various admins and to be honest, there's more support for it than I imagined. I think the thing the people involved don't realize how much support there is, so they keep quiet. Mglovesfun (talk) 23:08, 18 February 2011 (UTC)[reply]

And to think a vote on this not too long ago failed... -- Prince Kassad 23:15, 18 February 2011 (UTC)[reply]

For the record, that vote was about desysopping and ended 5-9-6. -- Gauss 23:51, 18 February 2011 (UTC)[reply]

So they keep quiet? What a pity. I happen to like to be praised once in a while. --Daniel. 23:40, 18 February 2011 (UTC)[reply]

Daniel., I can't help the impression that you might serve the project better by adding Portuguese words rather than fictional dogs and other characters. (Just a thought because I discovered a few days ago to my surprise that Category:Polish nouns seems to contain more entries than Category:Portuguese nouns (5052 vs. 4277), while I would have expected a different relation.) Better in terms of global benefit, I mean. -- Gauss 23:52, 18 February 2011 (UTC)[reply]

Right now, I can help by adding and attesting fictional characters mainly because this is a controversial subject that should be organized sooner or later, especially since there are active discussions everywhere about it. There are other ways to help too, but I really don't feel like adding 775 random Portuguese nouns right now just to keep up with Polish nouns. --Daniel. 05:33, 19 February 2011 (UTC)[reply]

But wouldn't be enough to add just one fictional dog and wait until that entry has been discussed? Or to create a vote? I am irritated by your strategy of flooding the project with controversial entries, as if you were wishing that nobody will have the energy to rfd/rfv them all. As if the other editors were rivals and you were trying to outfox them. --Makaokalani 12:51, 19 February 2011 (UTC)[reply]

I created and participated in numerous discussions and few votes about this subject. I kept fictional characters within Category:Fictional characters for the convenience of anyone who wants to see them, nominate them for attestation, etc. as a whole. Most of the proper nouns from that category were not originally created by me, though I searched for them and organized them into something as logic and consistent as I could for now. --Daniel. 13:02, 19 February 2011 (UTC)[reply]

Oppose Daniel. 23:24, 18 February 2011 (UTC)[reply]
- Because I like Wiktionary. --Daniel. 23:25, 18 February 2011 (UTC)[reply]

"Indefinitely" seems like overkill. His edits are highly controversial, but they're not harassment, not quite vandalism, and probably not trolling. Even in such extraordinary numbers, I don't think they add up to an indef-blockable offense, at least without trying less severe remedies first. (That said, if this came to a vote, I might "support" it. An indef-block might be better than doing absolutely nothing.) —Ruakh_TALK 01:41, 19 February 2011 (UTC)[reply]

Or you (Wiktionarians as a whole) could as a community decide to support my actions, or oppose my actions. Both forms of consensus already have been working well. --Daniel. 01:58, 19 February 2011 (UTC)[reply]

As before, wild accusations leveled, but no evidence offered. —Stephen ^(Talk) 05:16, 19 February 2011 (UTC)[reply]

I think you should first try to find support for a weaker statement. The statement you are now seeking support for is "Daniel. should be indefinitely blocked", a statement that is executable and has severe consequences for the user. Weaker statements, not immediately executable, include "Some behavior of Daniel. is unworthy of an admin", "Daniel. does many controversial edits", or "I wish Daniel. would change some of his behavior". I would sign all three statements.

To build at least a minute chance that a vote on infinite block could succeed, you probably need to build a case, which is a lot of work, I am afraid. --Dan Polansky 07:01, 19 February 2011 (UTC)[reply]

Some entries created by Daniel are not words and should be deleted, e.g. Clifford the Big Red Dog or Hound of the Baskervilles, but I think the simple solution would be to clearly state the principle all words are accepted (whatever their meaning), but only words. (this includes phrases that belong to the vocabulary of the language and can be studied from a linguistic point of view). Lmaltier 13:28, 19 February 2011 (UTC)[reply]

Among all the personal positions on criteria for inclusion of Wiktionary, yours is usually presented as very simple, yet it is subjective. Where one draws the line between a word and a nonword?

Per our current policies, both entries "Clifford the Big Red Dog" and "Hound of the Baskervilles" would be idiomatic (they have characteristics that can't be inferred from the parts of their names), but dependent from their universes (because one has to understand their stories to recognize the characters), then their independence would be introduced by certain citations (and they're cited already). --Daniel. 13:54, 19 February 2011 (UTC)[reply]

While this argument sounds plausible, such activity is a waste of time (mainly yours, admittedly) because pretty much no practical person is going to look for such content in a dictionary - for someone finding a reference to a dog named Clifford and suspecting it to be a fictional character the information given is hardly useful in any respect (except perhaps the link to WP). On the other hand, if I wanted to know, say, the word for subtitle in whichever foreign language or related grammatical information on it then it would be natural to look for it a dictionary. The success of this project, like the success of most others, depends on how useful it is, not how much entries it contains. In the long run, too many entries which just barely do not fail CFI are detrimental to WT's reputation, I guess. -- Gauss 14:42, 19 February 2011 (UTC)[reply]

First of all, while I don't consider this task a waste of time, I still feel surprised by the fact that the existence of entries of fictional concepts apparently is attributed mainly to me; while, by comparison with other editors, my main role was just of organizing these entries. (It would be like saying that I created most of the members of Category:Appearance just because I populated that category.)

Mickey Mouse, for example, contains a pronunciation section and various translations; I believe other languages would include declensions as well. Citations:Lassie contains a considerable amount of examples of usage of the name of a certain dog, among few explanations of its characteristics. Wikipedia is not a very good place to find these pieces of information, except in certain cases; in my experience, "certain cases" are translations only back to the original language when different from English and pronunciations only of entities who merit individual pages, and mentions mainly by "reliable sources" by linking back to them. While the pronunciation of "Mickey Mouse" can be derived from those of "Mickey" and "Mouse", that hardly would be the case for Pikachu or Hulk. --Daniel. 15:00, 19 February 2011 (UTC)[reply]

The distinction between something which is a word and something which is not a word is not always obvious, but, in many cases, it is obvious and consensual. It's clear that Lassie and Clifford are words you can find in English texts, and that Clifford the Big Red Dog is a title, but is not considered as a word by anybody (not even you, I think), unlike New York. If it can be argued that CFI consider that this is a word, then that means CFI should be revised (they're much too complex anyway, this is why there are such discussions; see KISS principle). Lmaltier 15:18, 19 February 2011 (UTC)[reply]

Our rules as currently interpreted don't seem to exclude Clifford the Big Red Dog. If you object to the rules, them you might want to propose amending them. IMO, prime candidates for reconsideration are WT:FICTION, WT:BRAND, the toponym vote, and the vote on attributive use/names of specific entities. The chickens have been coming home to roost -- and they look like Daniel.. I dislike such entries, especially with the encyclopedic definitions and misleading way in which they are purportedly cited, but those are quality issues, not rule violations. Daniel. might make a good attorney. He seems very much inclined to take our rules and push them to their limit. I think of the creation of the one-page per word appendices for words from fictional universes. DCDuring TALK 16:32, 19 February 2011 (UTC)[reply]

I'll try to propose a general umbrella for CFI, basic principles acceptable by everybody. Lmaltier 17:30, 19 February 2011 (UTC)[reply]

Such an undertaking is a heroic effort, in the nature of drafting the Napoleonic Code. DCDuring TALK 18:37, 19 February 2011 (UTC)[reply]

Nah! Trivial! All you have to do is unambiguously define the words (deprecated template usage) all, (deprecated template usage) word and (deprecated template usage) language. The "all words in all languages" does the rest. SemperBlotto 18:42, 19 February 2011 (UTC)[reply]

I've already suggested defining "word" and "language" unambiguously, at least. :p This might help. --Daniel. 19:05, 19 February 2011 (UTC)[reply]

There are far too many obstacles to getting a usable CFI. First, we have a sizable population of users who don't want proper nouns at all, or at least want to exclude most proper nouns that we currently do allow. Second, we have a much larger group of users who have "CFI shouldn't allow Pokemon (or similar)" (or, more accurately, "Wiktionary needs not to look silly") as a very high priority, regardless of whether it's possible to have a general CFI that disallows "unprofessional" entries without making overly specific additions. Third, Wiktionary's scope has a lot of areas, and many people are looking at CFI from the point of view of any of these individual areas. (Looking at if from a definition perspective and thinking whether anyone needs a definition for it, looking at it from a translating perspective and thinking whether translations could be useful, looking at it from a Rhymes perspective and thinking whether anyone would be surprised at seeing the word in a rhymes list.) Fourth, "attestation" is impossible in way too many situations, and we don't have any other way of verifying anything. Fifth, we want neologisms and we don't want protologisms, and no one has a cutoff point in mind. Sixth, when to include brand names, trademarks, company names, and anything else sufficiently commercial is a nightmare to figure out. (Anyone want to continue the list?) --Yair rand (talk) 05:29, 20 February 2011 (UTC)[reply]

So, what to do? Things can change (the project is still very young). I think that the best way is to come back to objectives and basic, founding, principles, to clarify them, and to agree on them. This would make a consensus on detailed criteria much less difficult. Lmaltier 11:23, 20 February 2011 (UTC)[reply]

Oppose blocking. --Yair rand (talk) 05:29, 20 February 2011 (UTC)[reply]

Strong oppose. --Anatoli 22:30, 20 February 2011 (UTC)[reply]

Thank you guys, for all comments in opposition to the original idea of this thread. (: --Daniel. 10:20, 21 February 2011 (UTC)[reply]

While I think it is clear that, in general, most people don't want to block you, I don't think you should flippantly overlook the fact that there are a large number of contributors who are upset or annoyed to one degree or another with the way you behave on this project. In your behavior and attitude (according to what I have read from you) you seek out controversy for the sake of pushing boundaries. This is not something which I believe fosters community and I don't think it helps further the project.

I would hope that a smiley face and an utter disregard for the serious concern that other contributors have raised about your activity are not your response, but rather that you would evaluate what you are doing and how you are doing it and perhaps concede that picking fights is not the best way to change policy here. Don't consider this a victory for your methods, if you do then I my vote here is an unambiguous

support for blocking. We need more people willing to propose a change, discuss a change, find a compromise that meets widespread support, and then affect that change; we need fewer people who take a position and refuse to change despite widespread concern over that change.

If I didn't think you had the potential to contribute in an unquestionably positive way I would just vote support here and not really qualify it. Your contributions to date however are riddled with lots of low- to no-value entries which will probably end up being deleted when your crusade to change the CFI by drowning it in borderline or over-the-line dross is ended, either by you getting tired of it or the community getting tired of it and ending it for you. Take a few minutes (or longer) to think about how best you can serve the goal of Wiktionary and whether it is in a flood of fictional characters or perhaps some other method which you can enjoy and is also widely accepted. - [The]DaveRoss 11:39, 21 February 2011 (UTC)[reply]

I don't seek out controversy for the sake of pushing boundaries; I believe the closest to this I have said is: "I can help by adding and attesting fictional characters mainly because this is a controversial subject that should be organized sooner or later, especially since there are active discussions everywhere about it." I can also discuss about them; for example, right now. I can also create, delete or refine entries based on discussions, as I have done before.

Seriously, I don't think "flippant" behavior is an issue. So, I don't complain about this discussion being opened with "Discuss. Have fun y'all."

Surely I may have overlooked or disregarded something important; however, it would be more productive to talk about disagreements in plain English instead of replacing facts by the simple proposal of indefinitely blocking me. Anyone, feel free to criticize my actions, and even feel free to point out ways of improving my behavior as an editor, though I'd appreciate if you could demonstrate that you've read my replies, either by replying directly to them, or perhaps by merely don't repeating a particular criticism if it has been proven wrong, and you aren't willing to counterargue.

If I may continue giving advice on how to interact with me (and possibly with other people), then let me point out that, when you disagree with me, I will most certainly disagree with you too. That seems a basic logical reasoning, though it apparently has been neglected by some people. When a Wiktionarian says they disagree with an entry, a policy, or an action, they're making a point. When they say "you're so stubborn, so you should get out of here indefinitely, then my opinion will prevail", they're just sounding ridiculous. For example, when I discussed about the creation of a category for individual people, then created Category:Individuals out of consensus and populated it, then a sense of Jesus Christ that was created in 2004 suddenly became the subject of an RFD discussion under the argument that it was part of "[seemingly] wilfully anti-community" "mass addition of entries contrary to consensus". That just didn't make sense in various levels.

I'm an editor since 2006 and an administrator since 2009; of course I've discussed many times about mine and other's actions. While "we need fewer people who take a position and refuse to change despite widespread concern over that change" is subjective enough to be impractical, as a rule of thumb, I don't even qualify for that specific criticism because I like to open discussions with people to know their opinions; when I am engaged in editing entries, templates and categories, I often take suggestions from other people (though there are some non-implemented suggestions yet, so I apologize for making anyone wait for my help); when I discuss, I often comment on others' opinions.

I don't have any plans of irritating people or picking fights; on the contrary, more than once when people attacked me or my ability to be an editor, I discussed with them until reaching an amicable agreement. I guess I made two or three friends that way. However, I don't feel obliged to change my opinions to keep up with others' opinions, even when I agree to follow them.

Personally, I'd hardly call 152 entries of fictional characters a "flood" (Maybe it was just a "flood" when people saw streaks of related edits in their recent changes; or, perhaps, my definition of flood differs from them, especially if they possibly want a Wiktionary devoid of fictional characters.) And I hardly would attribute most of their existence to me, because... I didn't create most of them. If you compare the categories of fiction with the ones of mythology, you would most certainly notice that the latter is much messier; it seems no one took the time to clean the latter up, while I organized the former (though it isn't perfect yet, mainly due to inconsistencies of topical categories as a whole, so I've been creating discussions to gather consensus on this subject). I believe categorizing names of fictional characters into Category:Fictional characters is already "unquestionably positive", regardless of whether or not their members are subject of controversy. On the other hand, narrower categories such as Category:Fictional people and Category:Fictional dogs may be less likely to meet consensus, so naturally I've been discussing them too. --Daniel. 13:45, 21 February 2011 (UTC)[reply]

Support blocking. --Fandelasketchup (talk)10:30, 24February 2011 (UTC)
Why? DCDuring TALK 13:58, 24 February 2011 (UTC)[reply]

Dating rfts

I don't come that often, so this may be old news, but it seems to me that the column on the right of the Tea Room listing all rfts is new. I like it, but lots of these discussions are stale and have been archived, so that clicking on the word takes you to the page and then clicking on the rft just takes you back to the Tea Room without helping you find the discussion. If the datestamp of the rft were added automatically when the rft is added, that would make finding the original discussion easier. Not sure this is clear. Let me know if not.--Brett 16:33, 19 February 2011 (UTC)[reply]

Tea Room discussions should be copied to the talk page for the entry, IMO. DCDuring TALK 16:50, 19 February 2011 (UTC)[reply]

Not new, just newly overprominent: in the new version of MediaWiki it becomes a ginormous <pre> for some reason. I've found a hack to make it look better, which I applied the other day on WT:RFV and just now on WT:TR.
To find stale archived discussions, you can use [[Special:WhatLinksHere/foo]].
I agree with DCDuring, except that I think they should actually be moved to the entry's talk-pages rather than copied there. I don't see a need for a central Tea-room archive at all.
—Ruakh_TALK 17:03, 19 February 2011 (UTC)[reply]

Copying to the discussion page makes a lot of sense. Not sure if there should be a central archive of discussions. Duplication wouldn't be good, but at least a list of discussed words with links to the discussions on the individual words' pages.--Brett 17:07, 19 February 2011 (UTC)[reply]

The archive is automatically there in history through the magic of Wiki software if we really need it. (But why would we?) Having the substance at the talk page is more useful to users, passive and contributing, than an archive of discussions whose existence is unknown to a user of the entry. I'd be happy enough with no archive, other than in the form of history. But links to the archive would also be acceptable if that is easier to do or, better, to automate than moving the content to appropriate talk pages. Perhaps someone could process the history from a full dump into something more accessible. More usefully perhaps someone could process the TR archive into talk page links (avoiding duplication, if possible). DCDuring TALK 18:20, 19 February 2011 (UTC)[reply]

I've started on Wiktionary:Tea room/2010/March (so chosen because the oldest-tagged request for tea was on that page). One problem I've quickly run into is that many tea-room discussions aren't really about one individual entry, or even about a small, explicitly delineated clutch of entries. I'm archiving those that I can; those that I can't, I guess will remain on that page. —Ruakh_TALK 16:31, 21 February 2011 (UTC)[reply]

Question: I've been using {{rft-archived}} for these, but it occurs to me that such an archive-box may not really be helpful for these in the way that they are for RFD and RFV discussions (where we want to preserve the exact discussion that led to whatever decision). I'm wondering if I should just move these the talk-page with whatever header seems appropriate to me, and a hatnote explaining where the discussion started? I mean, if someone wants to reply (belatedly) to an old tea-room comment, there's no reason they shouldn't just reply in the normal threaded-discussion fashion, right? —Ruakh_TALK 00:23, 24 February 2011 (UTC)[reply]

Not IMO. It can, at least sometimes, make it appear as though those commenting in the TR discussion might have done so on the talkpage (or that later responses might have been made in the TR discussion), and so might be expected to respond to responses to what they said, and a "silence is acquiescence" argument might be made. (Not that that argument is particularly strong on a wiki anyway, but still.)—msh210℠ (talk) 16:50, 24 February 2011 (UTC)[reply]

Prefacing verbs with to in Template:en-verb

Currently, Template:en-verb automatically adds to to before the base form of the verb. I suggest that we do away with this for the following reasons:

The word to is no more part of the verb than the is part of the noun. Rather it is a subordinator that marks the following verb phrase (VP) as subordinate and infinitive, similar to the way that that marks the clause as subordinate in ...that he arrive on time
It is not the infinitive form of the verb that is being shown but rather the base form, which happens to be used in the infinitive, the subjunctive, and the imperative. These are all types of clause or VP, depending on your definition of clause, not verb forms. Of these, only the infinitive employes to.
Even if it were the infinitive, and not the base form, there is the marked to infinitive (e.g., I want to go), and the bare infinitive (e.g., make me go).
The standard among other English-language dictionaries is to present the verb without the to.

--Brett 17:04, 19 February 2011 (UTC)[reply]

The principal reason for retaining the "to" is that it is yet another backstop against users mistaking a verb entry for another PoS and that some of our term template glosses retain "to" to make clear that the etymon is a verb, not a noun or other PoS. I suppose the backstop is redundant where inflection is shown. But we also have many entries for phrases that are headed by a verb that do not show any inflection. IOW, "to" in an entry or entry section serves as a marker that the entry/section concerns a verb. In some cases (eg, glosses) it is not redundant. The uses in the inflection line are redundant because of the PoS header and sometimes the content of the inflection line. DCDuring TALK 18:33, 19 February 2011 (UTC)[reply]

It is in the inflection line that I'm suggesting it be removed. It is both redundant and misleading. I don't know what a "term template gloss" is, but I agree that you couldn't remove the to in cases like crawl: to move slowly unless you moved to full sentence explanations such as If something crawls, it moves slowly, a change I'm not advocating.--Brett 18:48, 19 February 2011 (UTC)[reply]

Keep the 'to' and keep the 'a' in {{ro-verb}}. Mglovesfun (talk) 22:05, 20 February 2011 (UTC)[reply]

Why?--Brett 00:58, 21 February 2011 (UTC)[reply]

Just feels right. Mglovesfun (talk) 14:08, 21 February 2011 (UTC)[reply]

In an inflection line, "to" seems like a pretty clear way of indicating that the following word is the base form of the verb. But is such an indicator necessary or helpful? I don't know. —Ruakh_TALK 14:11, 22 February 2011 (UTC)[reply]

You have some good points there, including the predecent of English dictionaries. However, it seems that most of the points also justify the removal of "to" from the definition lines, which is not customary. I don't really know; interesting points, anyway. --Dan Polansky 14:42, 22 February 2011 (UTC)[reply]

The other thing is that our inflection line displays the principle parts for a verb, making it a bit like a kind of grammatical table rather than just a dictionary lemma. And while the to-form is not that common in dictionary headwords, it's very common in declension tables and the like. Ƿidsiþ 14:52, 22 February 2011 (UTC)[reply]

I agree that removal from the definition lines is not customary. Of the one-look dictionaries, only COBUILD learners and Wordnet do so. COBUILD uses complete sentences. Only Wordnet uses to-less infinitive clauses. But I don't agree that the arguments are the same. The word forms should show the word on its own. The definitions, however, are typically infinitive clauses: words used with other words. And in English, when we use infinitive clauses as subjects or complements of linking verbs, it is always marked with to (e.g., subj: To join a group is..., comp: ...is to join a group.) Notice that this is also an issue of mention vs use, where when you mention a word, the typical syntactic properties it has don't apply.

I also think the argument regarding declension tables is misleading. First of all, for the reasons I pointed to above, I think this practice is a mistake even in those tables that employ it. Secondly, again as I pointed out above, many dictionaries list all the forms. These could be equally said to resemble declension tables, but the major dictionaries don't use to here. Finally, more modern declension tables often don't list to. For example, the Azar English grammar series (Pearson Longman), one of the most popular ESL grammars in the world, simply gives the verb alone in its lists of irregular verbs.--Brett 16:28, 22 February 2011 (UTC)[reply]

This issue doesn't seem to generate much interest, but I'll give it one more shot with another analogy. Putting to in front of the verb is like putting be in front of the present participle. Yes, it commonly appears there, but it isn't part of the word form, and it's inaccurate to include it.--Brett 15:52, 24 February 2011 (UTC)[reply]

I find your reasoning rather convincing. --Dan Polansky 16:30, 24 February 2011 (UTC)[reply]

OK, since there don't seem to be strong opinions about this, should I "be bold", change it, and see what happens, or should we put it to a vote?--Brett 13:19, 26 February 2011 (UTC)[reply]

I've made the change and copied this discussion to the template's discussion area.--Brett 12:20, 28 February 2011 (UTC)[reply]

Basic principles for CFI

Here are a few simple principles to be included as the beginning of CFI, for comments and improvements before starting a vote. Of course, detailed CFI should be made consistent with these principles. They don't solve everything (detailed CFI are still required), but they should help much. They don't deal with inclusion criteria for languages: this is a very important, but distinct issue.

1. The objective of the Wiktionary project is to give people information required when they want to understand a language, or to speak (or write) in a language. More precisely, learners of a language may have to learn:

encyclopedic knowledge about the culture of native speakers
linguistic knowledge about:
- the grammar of the language
- the vocabulary of the language.

Encyclopedic knowledge is provided by Wikipedia, not by Wiktionary. Main space Wiktionary pages focus on the vocabulary part, i.e. lexical items (including set phrases) that learners may have to learn if they want to understand, to speak and to write the language as well as native speakers, even in very specialized domains (note that this also applies to obsolete words and dead languages, despite the way the rule is expressed).

2. Lexical items as defined above are called words for the purpose of these CFI. Wiktionary describes words from a linguistic point of view, i.e. it provides information of linguistic nature only, in addition to definitions (and possibly pictures). Definitions must be as succinct as possible, but clear, and sufficient to fully understand the meaning of the word. When present, pictures must help to understand the meaning of the word. In addition to words, Wiktionary also describes some other items of linguistic interest: affixes, characters, proverbs (list possibly to be completed).

3. All words of all languages are accepted for inclusion. All forms of all words may be included too (unlike other dictionaries), except in the case of phrases (restrictions may apply to this case for practical reasons).

4. A language section for a word may be created if and only if this word is used is this language. Used means that some people have used the word (not only mentioned it) and expected other people to understand it; this excludes typographic errors, misspellings, errors made by people learning the languages, etc. (there may be exceptions when interesting and useful linguistic may be provided about such errors). This rule does not exclude words which are not fully naturalized in the language (e.g. an English section is allowed for autoroute, despite the fact that it's difficult to consider this word as an English word). When the existence of a word in a language is disputed, attestation rules apply. These attestation rules may be more or less strict for different kinds of words in order to prevent the creation of useless entries (e.g. brand names just coined by a newly-created company).

5. Rarity and recentness are not considered, the only important thing is that the word must exist in the language (actually, pages dedicated to rare words and to recent words are likely to be very useful to readers, because they are likely to be absent from other dictionaries).

6. When it is obvious to everybody (or almost everybody) that something is a word, and that this word is used in the language, the creation of a section of this language for this word is accepted (detailed CFI are not applicable).

7. When it is obvious to everybody (or almost everybody) that something is used in the language, but cannot be considered as a word as defined above (element of the vocabulary of the language), this item is not accepted (detailed CFI are not applicable).

8. In other, less obvious, cases, detailed CFI explain in a more detailed way how the present basic principles should be applied in specific cases.

Lmaltier 22:04, 19 February 2011 (UTC)[reply]

Why? It seems to push principles like "all words of all languages" that recently failed a vote, and pointless obviousness principles--if it's obviously a word, then cite it; if everyone agrees it's obviously not a word (and I'm not sure what you mean there), then the RfD should be short. And I really don't know why it's hard to consider autoroute an English word. "the growth of the suburbs, despite its uneven internal distribution in the period 1951-75, was intimately linked to the growth of the network of autoroutes and bridges surrounding Montreal." is just one Google Book hit that's clearly using the word in English. Combine that with a general oververbosity and unnecessity, and I don't see the point.--Prosfilaes 04:17, 20 February 2011 (UTC)[reply]

I may be wrong about autoroute, please change the example if needed. But of course, the word is used in English, this is why there is an English section. I just wanted to insist on the fact that the question should not be Is this an English word? but Is this word used in English?.

I don't know what vote you refer to. This principle all words, all languages is not new, it's present from the origin of the project, and it's the first sentence of CFI. The goal is to clarify what it means.

Something new is that people need a dictionary not only to be able to understand, but also to speak or write. This fact has important consequences on which phrases are acceptable (you cannot guess that something is the set phrase to be used to express what you want if you have not learned this set phrase and you've never heard it).

But the main point is that criteria can never be formulated perfectly. This is the reason why there are so many discussions based on the letter of rules, but the spirit of rules, basic principles, common sense, are forgotten. When basic principles are sufficient to take a decision, there is no need to enter into details. In other terms, basic, sound, principles are more important than imperfect detailed rules, which are required only in some cases. You might compare what I propose to a Constitution and detailed CFI to laws detailing this Constitution. Lmaltier 09:23, 20 February 2011 (UTC)[reply]

There was an attempt at Wiktionary:Purpose to try and define the reason "why" we should have such rules — it's all very well stating them (and I mainly agree with what you've put, though think they are far from "basic" principles :) but if there's no "why" then it's completely open to debate.

I like the distinction between cultural-knowledge and linguistic knowledge. It may be the case that this can be used to decide whether or not something is a "word". Given a sentence like "Go past Draycot Foliat and take the first on the left", I don't need to ask "What does Draycot Foliat mean?", it's clearly just a name for something; rather I would ask "What is known about Draycot Foliat?" (lest I fail to recognize it) — a question Wikipedia is more suited to answer. On the other hand, if I have a sentence like "Draycot Foliat is a tithing" — I need to ask "What does tithing mean?", I could also ask Wikipedia "What is known about tithings?" if I wanted more in-depth discussion.

I also like the OED's FAQ [6], and [7] is also interesting. Conrad.Irwin 02:48, 21 February 2011 (UTC)[reply]

I was assuming that the "why" was explained clearly enough. You are right, this is fundamental. Lmaltier 07:07, 21 February 2011 (UTC)[reply]

German CFI

I've started a draft here. The problem with German is that it writes most words together, so the community is split on whether such words can be technically sum of parts. These rules are supposed to clarify the situation. Feel free to suggest things. -- Prince Kassad 00:04, 20 February 2011 (UTC)[reply]

I believe that we should accept all German words (i.e. everything considered as a word in German) provided that their use can be attested. Some time ago, there have been many comments on the longest German word used in official texts. These comments clearly show that even very long compound words are considered as words in German (but this kind of word is exceptional). The attestation constraint is sufficient to limit the inclusion of such words. Lmaltier 09:32, 20 February 2011 (UTC)[reply]

Well, we don't think the same for any of the East Asian languages (Chinese etc.) so why should German get a special treatment? -- Prince Kassad 09:53, 20 February 2011 (UTC)[reply]

Is this the same case? I want to accept words that are considered as words by people speaking German. Are Chinese... "words" you refer to considered as words by Chinese? Lmaltier 09:57, 20 February 2011 (UTC)[reply]

Do note, in any case, that it is not ultimately my goal to forbid legitimiately useful entries which people will want to look up. Instead, my intent is to prevent editors from adding entries like neuntausendneunhundertneunundneunzig, which are not useful to anybody and just waste time and resources which could be better used elsewhere. Attestation criteria is insufficient for restricting these because of the large available German corpus, which allows even the most impropable compounds to be cited (including this one). -- Prince Kassad 14:37, 20 February 2011 (UTC)[reply]

I agree on this one: nine thousand nine hundred and ninety-nine is equally waste of time and resources. So this is not necessarily a question of German CFI only. --Hekaheka 07:56, 22 February 2011 (UTC)[reply]

This is a very special case. I agree that this is not very useful, and I think that such entries (in German or in other languages) should not be created by bot (creating billions of such entries by bot would be a waste of resource, you are right). They should be created manually, and only with several real citations. Don't worry, nobody will be willing to try to find millions of citations and to create manually millions of such entries. But such entries may be useful nonetheless in some cases, especially to people not knowing the language at all and trying to decode a text (e.g. a message they received), this is why they should not be forbidden. Lmaltier 14:56, 20 February 2011 (UTC)[reply]

So is there now going to be a rule that someone has to provide citations for German words before entering them? A polysynthetic language would be hopeless to document word-by-word; every new document would have new words. Including every word we could find three times would produce an arbitrary mess. In certain languages, a reader has to be assumed to be able to build and dissect SoP words with us just providing the Ps. It seems like German is one of those languages.--Prosfilaes 18:47, 20 February 2011 (UTC)[reply]

You reason as somebody speaking English. de.wikt does not reason like you, it include words such as Tanzschule. Yes, these words should be accepted, but they should be attested, just like any word. Lmaltier 22:01, 20 February 2011 (UTC)[reply]

How is it that my reasoning is connected to my language? And what about polysynthetic languages where virtually every word is nonce? As a practical matter, very few of our words are actually attested; the requirement in practice is that they be attestable, not attested.--Prosfilaes 01:57, 21 February 2011 (UTC)[reply]

I was meaning that, sometimes, the equivalent phrases in English are not considered as words, but that does not mean that German words are not words. You are right, language with many words are likely to produce many pages here. When I write "attested", I mean "words that we know they have been used. Lmaltier 07:11, 21 February 2011 (UTC)[reply]

If having a million pages still doesn't let you reliably look up words, then there's a real question of what the value of having a million pages is, if we need to approach the problem another way. I don't see why we should produce a bunch of pages by hand, if we can produce equally high quality pages by bot, and if entries like neuntausendneunhundertneunundneunzig are acceptable, then we should produce them by bot, which should have no problem verifying the existence of three cites before creating the page.--Prosfilaes 07:40, 21 February 2011 (UTC)[reply]

I see no reason to exclude any orthographic word in German, no matter how sum of parts (obviously if it can be attested). Why worry about it? What "resources" does it waste exactly? It's a bit like saying lets exclude all English plurals in -s because their meaning is obvious. I don't think it's the same as East Asian languages, just because words are not spaced at all there; in German there are spaces between (orthographic) words and so SOP compounds are felt more as distinct words. They certainly are by non-German speakers. Ƿidsiþ 09:29, 21 February 2011 (UTC)[reply]

I feel roughly the same. We all give our time voluntarily here; I suppose editors could be viewed as wasting their own time by creating these. I agree it would be like not allowing things like chlorineless (discussed in the tea room) or researchers because they are obvious from the sum of their parts. The principle goes beyond German; especially to Dutch but also to English, as pairs of short words often become single-word compounds in English like faceguard. So we would logically have to refuse some of these too. Mglovesfun (talk) 14:05, 21 February 2011 (UTC)[reply]

I agree with Widsith's stance, and, mostly, with his reasoning.—msh210℠ (talk) 16:13, 21 February 2011 (UTC)[reply]

I agree with Ƿidsiþ. - -sche 00:16, 22 February 2011 (UTC)[reply]

The only risk is for numbers or other infinite series: what would you think if a bot begins to create billions of entries for Italian and German numbers, and if Random entry returns a number for 99.999999 % of clicks? This is why I propose to exclude bots and to require the presence of several citations for entries belonging to infinite series. Lmaltier 18:46, 21 February 2011 (UTC)[reply]

I don't see why we should do by hand what could be done by bot. A bot can certainly check Google Books for sufficient hits on a word. If a word is valid, and a bot can make a suitable entry, then let it make an entry. If there are several billion attestable words for Italian and German numbers, then there should be an entry for each of those words, and better done by bot than hand.--Prosfilaes 00:54, 22 February 2011 (UTC)[reply]

If we want "Random entry" to return an English word / a "real" word, why do we allow so many conjugated forms of Spanish and Italian verbs? Their meanings are easy to guess, verb stem + inflection suffix. - -sche 23:32, 23 February 2011 (UTC)[reply]

(Disclaimer: I don't speak German, so all of my knowledge of it is secondhand. Some of this comment may well be misinformed, and it will be shocking if every detail of it is exactly correct. Hopefully folks who know better will correct my errors.) Overall I think Prince Kassad's proposed CFI are the way to go, but German seems to be a difficult case, because on the one hand:

It seems obvious that (pace some commenters above) something like "neuntausendneunhundertneunundneunzig" is not a single word. Anyone who knows any German at all will instantly recognize that it's a compound that happens to be written solid.
Old-fashioned writing, of the sort that uses long s (ſ) medially, clearly recognizes that there is a distinction between this sort of compound and a true word, in that short S (s) is used at the ends of words even in the middle of compounds.
Due perhaps to influence from English, there's a tendency among some younger speakers to write such compounds with spaces between the words (though not so strong a tendency as in some of the North Germanic languages).
A dictionary can only do so much to help readers figure out where one word ends and the next begins. In the end, any sort of comprehensive help can only be achieved by true software using a dictionary as its back-end database, not by a look-up dictionary alone.

while on the other hand:

Even someone who knows some German may not be able to tell where the word-breaks are. Since our target audience is people who speak better English than German, it seems a bit unhelpful to say that a certain word-sequence is NISOP and users have to go elsewhere for help identifying the P.
As Widsith points out, German is not like East Asian writing where the orthography is syllable-oriented rather than word-oriented. These compounds are not "words", but they are "orthographic words" in a writing system where that concept is fairly meaningful.
Such phrases are at least constituents — they're noun phrases — so are at least conceivably possible to create entries for. (I'm thinking here, by contrast, of Hebrew, where several prepositions and conjunctions are proclitics, attaching to whatever word happens to follow. At least the German compounds are syntactic and semantic phrases rather than meaningless word-sequences.)
Compounds often have internal modifications at the word boundaries. But of course, such internal modifications should be documented at the entries for the individual words, so maybe that's a non-issue.

All told, as I said, I think Prince Kassad's proposed CFI are the best approach; but I definitely see where other commenters are coming from!

—Ruakh_TALK 01:37, 22 February 2011 (UTC)[reply]

Just to clarify: German compounds are words rather than phrases (not just in an orthographical but also in a linguistic sense). The whole problem is that German allows for arbitrary combinations of concepts to be incorporated into one (compound) word. Just because many of them will be attestable doesn't mean that they're actually "lexicalized" words of German -- it's perfectly possible that most of those will just be ad-hoc creations that happen to have been created multiple times. I believe this is what makes German different from most polysynthetic languages where complex letter strings, even though written without spaces in between, are not considered words, but phrases (I might be wrong here, though). In German orthographic words and actual words almost always coincide. So, since there are potentially arbitrary amounts of unlexicalized words, I doubt that the "all words" rule can be successfully applied to German. Where to draw the line between includable and not includable words, though, unfortunately I don't know. Longtrend 20:52, 22 February 2011 (UTC)[reply]

Arbitrary combinations may be allowed, but I don't think there are so many attested compound words meeting attestation criteria. Lmaltier 21:26, 22 February 2011 (UTC)[reply]

I am fairly certain that I do not want this text to become part of any official CFI; it should be removed from Wiktionary:About German, as it is supported only by a minority of editors, from what I can see:

Criteria for inclusion are currently strongly debated among the community. The issue is on which compound words are legitimately useful for a dictionary such as ours.

In English language, anything written together is automatically permissible, but this rule does not adapt well to the spelling conventions of the German language. Instead, inclusion of compound words should be based on a number of criteria. These are not binding, but fulfillment or non-fulfillment of these criteria can determine the worthiness of any compound word for being included:

If the meaning of the compound word is not obvious by just looking at the individual compound members, the entry should almost certainly be included. This should account the knowledge expected from both native German speakers and to a limited extent learners of German language, since these will be using the dictionary the most. For example, the meaning of the German term Baumschule cannot be guessed by knowing the two words Baum and Schule.
Certain terms have specific definitions in specialized dictionaries. These should therefore be included in this dictionary. This applies to terms such as Eigentumswohnung.
The following changes to a word do not affect their inclusion in any way:
- Usage of a filler phoneme like e or s, like in Bilderbuch
- Usage of merely the stem of a compound member, like in Wanderweg

I disagree with the core of what the text says: with treating, when determining idiomacity, all German closed compounds the same way as English open compounds. In particular, I want to see "Kopfschmerz" included, rather than being treated as English "head ache" would be if there were no "headache". --Dan Polansky 09:26, 23 February 2011 (UTC)[reply]

On de.Wikt, we try to exclude "Spontanbildungen" (spontaneous constructions, nonce words) like Scheißkind, but we include compounds, because they are words. It is unfriendly to non-fluent speakers to delete words only because they are compounds. Consider Dachterrasse: is it Dachter + Rasse, or Dach + Terrasse? Consider Wachstube: a non-fluent speaker could try splitting it W + Achstube and see that this was not an intelligible split. She could then split it Wa + Chstube, then Wac + Hstube, both also visibly unintelligible. She could split it Wach (guard) + Stube (room) and, because this is intelligible, never understand that the word was in her context however truly "tube of wax" (Wachs + Tube). - -sche 23:32, 23 February 2011 (UTC)[reply]

Of course words like Wachstube which can be composed differently would be included. (Vollzug, with two vastly different pronunciations and meanings, is another one of these.) I just need to amend the proposed rules to reflect this. -- Prince Kassad 23:46, 23 February 2011 (UTC)[reply]

I above all think that you should not be posting proposed rules to a page that tries to track community consensus: Wiktionary:About German. You would do well to move your proposal somewhere else. --Dan Polansky 09:25, 24 February 2011 (UTC)[reply]

I have removed the offending section. For the sake of further discussion of the proposal, the section is still available in this revision. You could develop further proposals at User:Prince Kassad/German CFI or the like. --Dan Polansky 09:28, 24 February 2011 (UTC)[reply]

For a further track of community consensus on the subject, see also Talk:Zirkusschule to which a RFD on the term is going to be archived. --Dan Polansky 09:49, 24 February 2011 (UTC)[reply]

Model dictionaries: I have found a list of Duden entries on German Wiktionary: de:Benutzer:Ivadon/Duden/5. It includes such entries as "Himbeergeschmack", "Himbeerlimonade", "Himbeermarmelade", "kälteempfindlich" and "kokainsüchtig", all of which are semantic sums of parts with respect to the words contained in the compounds. --Dan Polansky 10:04, 24 February 2011 (UTC)[reply]

@-sche: Nobody proposes to exclude compounds altogether, I think. At least opaque compounds such as Angsthase (coward, literally "fear-rabbit") would of course be included. Longtrend 21:32, 24 February 2011 (UTC)[reply]

Why not cling to the attestability criteria? It would prevent from entering made-up compounds and would keep guesswork out of interpreting German words and figuring out their English equivalents. After all, the number of compound nouns is not astronomical, and many of them would fulfil the "set term" -criterion. See also RfD discussion for Zirkusschule. --Hekaheka 19:44, 24 February 2011 (UTC)[reply]

The attestability criteria sound very reasonable at first, however this would still include words that I'm sure nobody would want to include. I just made up the word Rechtsfenster, transparently meaning "window on the right side". Would anyone want to see that included as an entry? I can't imagine that. Checking Google Books, though, it turns out that this word seems to be attestable indeed. It gets 5 hits, all extremely context-dependent ad-hoc creations that are purely results of a totally productive word formation mechanism in German and thus not dictionary-worthy at all. Longtrend 21:32, 24 February 2011 (UTC)[reply]

I still agree with Ƿidsiþ (09:29, 21 February 2011) and Mglovesfun (14:05, 21 February 2011). Wiktionary's remit is "all words in all languages". A child learning to speak German natively or an adult learning German as an additional language may look up a long word. Wiktionary is not paper, so these words do not take up space that we could use for other entries — we can have all entries. What harm, then, does it do to allow all attestable words, including compounds? Some have expressed fear that the "Random entry" function will return only German numbers if someone adds a flood of these: but already the "Random entry" function returns only Spanish and Italian verbs, which are perhaps all "sum of parts" by the logic expressed above, because they are almost all clearly "verb stem + inflection suffix", but which are in any case no better random words than German numbers. And if we do want to forbid numbers, numbers are only a subset of all compound words, and should be discussed specifically (and perhaps without regard to language: Hekaheka points out English can have as many numbers as German). Why forbid general compound words like Tanzschule? Why forbid Rechtsfenster? Some have suggested that compounds are not words, but as Dan Polansky hints, the authorities on the German language disagree: even [the unattestable] Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz is (possibly) the longest word (das längste Wort) — it is a word — it is not "the longest words". If we are to make an exception to our policy of "all words in all languages" to forbid compounds, I would think there must be some exceptional reason; Tanzschule must do us some exceptional harm. What is that reason, what is that harm? - -sche 09:55, 7 March 2011 (UTC)[reply]

Short answer to your last question: I wouldn't exclude Tanzschule, since it's a conventionalized word. I would exclude Rechtsfenster, since it's nothing but an ad-hoc creation.

Do we have any policy about polysynthetic languages? I think that the "all words in all languages" rule, even if we assume that it works for German, will fail for those languages, unless you want to have entries for whole propositions that happen to be incorporated into one word. Not even the attestability proposal will work here since it will be hard to attest words from unwritten languages (which is true for many polysynthetic languages). IMHO, the argument that non-native speakers of German, when they see the whole compound word, are unable to comprehend which parts it's composed of, doesn't hold water either: Even now we assume the reader to know that the single parts of a multi-word term (e.g. put one's money where one's mouth is), i.e. the words, are meaningless parts of that term rather than contributing compositionally to the meaning of the term. Longtrend 23:23, 7 March 2011 (UTC)[reply]

Deprecating Category:English plurals and the like

On WT:RFDO#Category:Catalan noun forms it has been suggested to always use noun forms rather than plurals. See also Category talk:Plurals by language. One problem with plurals is that not only nouns can have plurals - this is true even of English, but even more so of many other languages such as French (adjective plurals) and even more so those with case systems like Russian, Latin where there are different kinds of plurals - nominative, genitive, instrumental, etc. Mglovesfun (talk) 12:13, 22 February 2011 (UTC)[reply]

I suggest renaming from Category:English plurals to Category:English noun plural forms. --Daniel. 12:21, 22 February 2011 (UTC)[reply]

Yes, not to disagree but to comment, that wouldn't align English with Category:Latin nouns forms^[sic] and the other languages that use noun forms, not 'noun plural forms'. Mglovesfun (talk) 15:34, 23 February 2011 (UTC)[reply]

(I think you mean Category:Latin noun forms :p)

I'm not sure how it is the current position of other users as to detailed categories of inflections, but Category:English noun plural forms would at least align with Category:German verb plural forms.

I, personally, oppose the existence of "Category:English plurals" per Mglovesfun's reasons and support the creation of either "Category:English noun plural forms" or "Category:English noun forms". --Daniel. 15:44, 23 February 2011 (UTC)[reply]

Translations for inflected forms

Do we want translations for cars, fights, paints etc.? If so, why? Mglovesfun (talk) 15:08, 23 February 2011 (UTC)[reply]

In my opinion, translations for inflected forms are not worthwhile, because, among other reasons... There isn't a one-to-one correspondence between English and foreign inflections. For example, a verb in simple past or past participle (among other inflections) may be translated into various cases, grammatical persons, etc. of other languages. It would not be very helpful to state that stopped can be translated into Portuguese parado, parada, parados, paradas, parei, paraste, parou, paramos, parámos, parastes, etc. To indicate the correct translation among them for each context, we would have to duplicate portions of inflection tables, that I expect to be more readable at entries of lemmas. --Daniel. 15:32, 23 February 2011 (UTC)[reply]

See User talk:Stephen G. Brown#Translations of inflected forms and page history of "translations". Mglovesfun (talk) 15:36, 23 February 2011 (UTC) IFYPFY.—msh210℠ (talk) 15:56, 23 February 2011 (UTC)[reply]

I fixed the erroneous gender of the Portuguese translation of translations. Another reason for preferring the absence of translations for inflections is to avoid keeping track of the same information in various places like this. (though "keeping track of the same information in various places" is tipically a task for templates)

Also, that translation table is incomplete, maybe deliberately. Where are the declensions of German and Swedish within the English entry? I currently have to go to Übersetzungen, then to Übersetzung to find a German plural genitive from translations. --Daniel. 15:55, 23 February 2011 (UTC)[reply]

What Daniel said (15:32, 23 February 2011 (UTC)), and to some extent was Ruakh said (15:45, 23 February 2011 (UTC)). I used to think having a translations on a form-of entry was a good idea, but have come around.—msh210℠ (talk) 16:11, 23 February 2011 (UTC)[reply]

We should be translating lexemes, not wordforms. "Car" and "cars" are a single lexeme, and [[car]] is the appropriate place to provide translations for it. —Ruakh_TALK 15:45, 23 February 2011 (UTC)[reply]

Exactly. Ƿidsiþ 15:57, 23 February 2011 (UTC)[reply]

You could go further with the non-lemma forms; for example play could also mean "I play, you play, we play, they play, or play!" which in case possible French translations would be joue, joues, jouent and jouez. For -ir and -re verbs it could be more, as you could include subjunctive forms. Mglovesfun (talk) 15:59, 23 February 2011 (UTC)[reply]

Gets more complicated than that; traductions is the French translation for one sense, but also translations for the mathematical sense! So you'd need to change the definition from plural of translation to "more than one end result of translating text." Mglovesfun (talk) 11:52, 24 February 2011 (UTC)[reply]

When stopped needs translations (and synonyms), maybe this is a sign that it has gone from being just a past participle (a verb form) and really become an adjective in its own right. --LA2 13:13, 24 February 2011 (UTC)[reply]

My examples of translations of stopped were Portuguese verb forms, thus translations of an English verb form. --Daniel. 13:17, 24 February 2011 (UTC)[reply]

I split the entry translations after User:Stephen G. Brown reverted my initial removal, stating that there was "no consensus" for me to do so. Which makes me wonder what this discussion shows, just nothing? Mglovesfun (talk) 12:07, 28 February 2011 (UTC)[reply]

Ergo Wiktionary:Votes/pl-2011-02/Disallowing translations for English inflected forms.

Nobody is ever obligated to translate anything. We have always allowed people to translate the word, sense, and form they wanted and this has worked out well. There are very few translations of inflected forms here because people generally do not want to expend their energy on them. There are a few; for example, more, most, parents, including, bombing, interesting, barring, painting, dwelling, Ten Commandments, tired, accused, men, broken, Tamil Tigers, translations, spades, clubs, diamonds, hearts, Killing Fields, and people. These entries are all the more valuable as a result of their translations; some languages like French and Spanish have simple and predictable inflected forms, other languages are not so easy. The few inflected-form entries that we have where there are translations are the only place on the Internet where that stuff can be found in most cases.

I have a lot more faith in the judgment and experience of our linguists who decide for themselves which words, senses and forms they want to address than I have in Mglovesfun who just enjoys deleting useful pages and has no sense of their value. Deleting well-made pages that some people use or need always has bad effects, either on the users or on the contributors or both, even though the effects of lost work and resources is not easy to discern; on the other hand, deleting correct and well-formatted pages never brings any advantage at all, unless you count the thrill of deleting good information and hard work. —Stephen ^(Talk) 11:45, 7 March 2011 (UTC)[reply]

Most of the entries you cite have plural-only senses; you can't have 'a Ten Commandment' so that's not an inflected form, it's a plural only form. Mglovesfun (talk) 12:00, 7 March 2011 (UTC)[reply]

Lorem ipsum

Are all the words of lorem ipsum suitable as dictionary entries? If so, why? If not, why not?

Examples: lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget

--Daniel. 19:22, 23 February 2011 (UTC)[reply]

The community has already decided that they are not: see the old discussion.—msh210℠ (talk) 19:28, 23 February 2011 (UTC)[reply]

Some of them are words in languages, some of them seem not to be, as lorem ipsum isn't actually written in a 'language' as such. Mglovesfun (talk) 11:46, 24 February 2011 (UTC)[reply]

Well, of course, some of them are actual words. I assume Daniel was asking about those that are only in the lorem ipsum, or having a sense line because they're in it.—msh210℠ (talk) 16:45, 24 February 2011 (UTC)[reply]

That's what I meant too; I just couldn't be bothered coming back to expand on it. Mglovesfun (talk) 16:48, 24 February 2011 (UTC)[reply]

Norwegian headings

This was also asked on no:Wiktionary:Tinget#Norsk i en.wiktionary

We have many templates that make anchored links to entries based on the lang= parameter, e.g. {{form of|...|uno|lang=it}} will create a link to [[uno#Italian]]. But Norwegian (lang=no) has two standard orthographies, Bokmål (lang=nb) and Nynorsk (lang=nn), and three different headings. ==Norwegian== is used for sections describing words that are common to both Bokmål and Nynorsk, whereas ==Norwegian Bokmål== and ==Norwegian Nynorsk== are used for sections describing words that are unique to one variant. This creates a problem for form entries like husi, which is unique to Nynorsk, appears under a ==Norwegian Nynorsk== heading and links back to the main entry with {{inflection of|hus|lang=nn}}. The problem is that the main entry is common to both variants and thus appears under ==Norwegian==. Therefore, the template call needs to be {{inflection of|[[hus#Norwegian|hus]]|lang=nn}}.

As an experiment, I introduced a new template {{Norwegian}} that can be used for headings, always creating HTML anchors for all three variants, so you can link to bil#Norwegian, bil#Norwegian Bokmål and bil#Norwegian Nynorsk and always end up at the same place. It is currently used in the main entry bil#Norwegian and in the Nynorsk inflected form bilane. Is this an acceptable solution? Would it for example break language statistics if headings use a template like this? --LA2 11:38, 24 February 2011 (UTC)[reply]

Not a big fan of templated headers myself. Mglovesfun (talk) 11:47, 24 February 2011 (UTC)[reply]

This particular template, {{Norwegian}}, has at least three downsides: (1) it links to a draft policy (Wiktionary:About Norwegian) that contradicts the template, by saying "Norwegian entries should have a common L2 header, ==Norwegian=="; (2) it doesn't fix the discrepancy of identification of varieties of Norwegian that LA2 mentioned above; (3) all headers without templates also have anchors automatically, thanks to MediaWiki: see bil#Icelandic and bil#Faroese. --Daniel. 12:01, 24 February 2011 (UTC)[reply]

Yes, all headings have anchors. The problem is that ==Norwegian== in the page hus doesn't have the anchor for hus#Norwegian Nynorsk. In the most recent XML database dump, there were 6553 ==Norwegian== headings, 2401 ==Norwegian Nynorsk==, and 694 ==Norwegian Bokmål==. Should we enforce the draft policy and change these 2401+694 to the single standard heading? --LA2 12:16, 24 February 2011 (UTC)[reply]

While I'm not a speaker of Norwegian, the idea of merging these three headers into "Norwegian" seems good to me.

FWIW, we don't have language headers named "Hiragana Japanese", "Simplified Mandarin" or "American-spelled English" to indicate these written standards of these other languages. --Daniel. 12:32, 24 February 2011 (UTC)[reply]

I wrote some stuff some time ago on the policy talk page. I am not convinced that merging them to a common header is a too good solution - in fact, it is in some aspects easier to separate them completely, as tagging e.g. which synonym belongs to which standard can be a real pain in the rear end at times. Three different headers is a compromise between excessive tagging of words (one header) and creating almost two identical entries (two headers).

I should also point out that most of the differences between Nynorsk and Bokmål are not simply different spellings of the same words, as the use of the word ortography could indicate. For instance, the two words eg (Nynorsk) and jeg (Bokmål, both meaning 'I') do not have a common ancestor before the Proto-Norse word *eka! Njardarlogar 13:41, 24 February 2011 (UTC)[reply]

I see the reason for having a template generate anchors for all three languages. I don't see the point of having it link to About Norwegian, a page for editors. If you drop the link to About Norwegian, I'd like this idea, iff our major bot operators and dump analyzers assure us that they will still be able to do what they do even though the header is a template and not a raw language name. Otherwise, just use {{inflection of|hus|lang=no}} even in a Nynorsk entry.—msh210℠ (talk) 16:43, 24 February 2011 (UTC)[reply]

Don't the two standards also differ in vocabulary somewhat? I think it might be better to have just two headers. The duplication would make it explicitly clear that the word exists in both standards, because a lot of external sources treat 'Norwegian' as meaning the same as 'Bokmål', and so editors might assume the same. —CodeCa t 22:40, 24 February 2011 (UTC)[reply]

(1) On no.wiktionary, only one heading (Norsk = Norwegian) is used, and the differences between Bokmål and Nynorsk are indicated next to each word (e.g. no:bil), just as our draft policy Wiktionary:About Norwegian suggests.

(2) But on nn.wiktionary, two separate headings are used (Bokmål and Nynorsk), even if the two sections have identical content (e.g. nn:kinesisk).

(3) What my template experiment tried was to preserve the current mess with three different headings. I'm starting to conclude that this is a bad choice.

(4) The fourth possibility is to have three more years of inconclusive discussion, and I think that is the worst outcome.

I think we should either have one or two headings, but not three. In support of one heading (1, the no.wiktionary model) is the existing draft policy and a pretty advanced template {{no-noun-infl}} that handles both variants in one common inflection table. Myself being a Swede and frustrated that Scandinavia has too many languages already, I'm tempted to go for a two heading solution, where the main alternative is Bokmål, using the heading Norwegian. This would reduce Nynorsk to a second rate dialect, similar to Scots, Limburgish, or our historic entries for Old English. Even though this would be an Alexandrian solution to this Gordian knot, I'm sure such a suggestion would make many Norwegians bitter or angry, so perhaps one common heading is the better alternative? --LA2 01:12, 25 February 2011 (UTC)[reply]

The last "solution" that you put up there is totally unacceptable in many ways (it's equivalent to labeling Swedish as 'Scandinavian' due to its size and the rest as 'regional dialects').

CodeCat: yes, the vocabulary is also different. Many important/frequent words are different (though they often have the same Old Norse roots). Njardarlogar 10:12, 25 February 2011 (UTC)[reply]

I think taking into account what would make people angry would be a violation of NPOV. What do you mean by "second rate dialect"? I don't see anything in common between our coverage of the languages you listed, so I don't understand what you're refering to. --Yair rand (talk) 11:57, 25 February 2011 (UTC)[reply]

I think we need to look at this from a usability point of view. Most people who look up words on Wiktionary already know two things. They know the word they're trying to look up, and the language it's in. So, someone who wants to look up German (deprecated template usage) das will first find the entry 'das' and then look for a German section. But for Norwegian it's a bit more complicated right now, because someone who wants to look up a Nynorsk word will have to look not just for a 'Norwegian Nynorsk' section but also possibly a 'Norwegian' section. And to make matters more confusing, entries like (deprecated template usage) som contain both... —CodeCa t 13:47, 25 February 2011 (UTC)[reply]

For a solution with two headings (2 above), there is a questions what those headings should be. As Njardarlogar says above, using "Norwegian" as a heading for Bokmål is unacceptable to many, so we would have to ban that heading, only using "Norwegian Bokmål" and "Norwegian Nynorsk". I think this is yet another argument for the one heading solution (1 above). Do we have consensus for solution (1)? --LA2 14:09, 25 February 2011 (UTC)[reply]

Please do not use templates in headings — it makes it significantly harder to work out what's going on (particularly if they take arguments and do magic — the worst offender is the "abbreviation" thingy). If we want to have one unified heading for Norwegian, then the text "Norwegian" does that fine [there are much better ways of distinguishing dialects, {{UK}} and {{US}} for example]. In common English parlance (and this is a dictionary for English speakers) both languages are called Norwegian. This ties closely into index creation — at the moment I'm using the headings to determine which words belong to which language, and to find translations by looking for * [language name]: {{t|. It's not the end of the world, and I could write code to fix it for Norwegian [and then for BCS [and then for ...]]; but every single person who wants to get information out of Wiktionary will have to fix it too. On a related note, what should Index:Norwegian contain? All of Norwegian Nynorsk, Norwegian Bokmal and Norwegian; or just Norwegian and have Norwegian Nynorsk and Norwegian Bokmal as separate indexes? Conrad.Irwin 19:33, 26 February 2011 (UTC)[reply]

On no.wiktionary (which uses model 1), the category:Norwegian nouns has subcategories for nouns in Bokmål and Nynorsk, but vast majority of nouns are in the main category. I think one common Index would be the way to go. --LA2 01:23, 27 February 2011 (UTC)[reply]

We shouldn't reach a final conclusion before we have setups properly defined. The consequences of a merger are at present largely unknown. If we are going to settle the issue "once and for all", we have to do a decent job. No need for the haste.
@CodeCat: I do not see how entries like som are more confusing than having to e.g. check out different etymologies. Njardarlogar 16:49, 3 March 2011 (UTC)[reply]

What do you mean with "setups properly defined"? What is missing? The proposal is to follow what the existing draft policy says, and not create any new entries with other headings than ==Norwegian==. It's your creation of non-standard entries that should be slowed-down. --LA2 20:22, 3 March 2011 (UTC)[reply]

That policy is indeed a draft. The details should be discussed - do people actually agree with them? Do they follow the standard framework of the English Wiktionary?

Furthermore, I do myself oppose a one header solution, I think it is more accurate to include the label "Bokmål" or "Nynorsk" in the header rather than beneath it. The two written forms do, after all, not fully converge on some sort of middle point. Njardarlogar 13:30, 4 March 2011 (UTC)[reply]

Ehm, discussing this issue is exactly what we have been trying to do here. I think we all agree that having just one heading, which is what the draft policy says, is very well in line both with the rest of English Wiktionary and with the Norwegian (Bokmål) Wiktionary. As far as I can see, you are the only one against this. But except for your personal opinion, what arguments do you have? Earlier, when other users have asked on your user talk page that you should follow the existing draft policy, you have also refused to do so by only referring to its draft status and not by contributing any good arguments for not accepting it. --LA2 00:58, 5 March 2011 (UTC)[reply]

My question is how many people have actually read WT:ANO? Getting support for the number of headers is one thing, but there are more details to consider. For instance, does layoyt nr. 3 look good? (which I suppose is the layout that is actually going to be used)

Here's another test of the proposed layout. It may not illustrate all of the consequences draft policy setup, however; one important thing is that ther are quite a few entries that will receive this vague tagging on the inflection line. The Nynorsk part is almost invisible. There has been drawn parallels to British English versus American English, however, I do not think it is a too grave offense to write customize in a text otherwise using British spelling. In comparison, using certain Nynorsk words in a Bokmål text or vice versa would be equivalent to using Swedish words in Danish text in the minds of many. I therfore find the vague tagging that follows the one header solution as not too good solution, as people may more easily misinterpret our entries.

I was asked to follow a draft policy which nobody have agreed upon! Naturally, I would not comply (please also notice the dates of these two edits [8] [9], and who is making them. Conclusion: things are not as set in stone as has been claimed. This is also clear by reading the talk page at WT:ANO) . Njardarlogar 09:30, 5 March 2011 (UTC)[reply]

As you can see from the first line of this discussion, I have tried to invite opinions from the Norwegian (Bokmål) Wiktionary to this discussion, but their reaction was to ask what the problem is, because their Wiktionary follows just what our draft policy proposes (one single heading) and it works great for them. I didn't ask the Nynorsk Wiktionary, because it is almost dead with no daily activity. You are of course free to invite more opinions.

Yes, I think your example of "layout 3" looks great. The examples (tjørn, vatn, draum) are somewhat extreme, just like a good example should be, in illustrating differences between Bokmål and Nynorsk. If (Bokmål) and (Nynorsk) are sprinkled all over the example, it is not going to be much worse than this, because many words are common to both variants of Norwegian. In many cases, quotations and example sentences will need to be taken from Ibsen, Bjørnson and older writers which predate any standardized Riksmål/Bokmål, and from Vinje/Aasen which predate modern standardized Nynorsk, some bordering on either pure Danish or dialects, so the year of the quotation will say more than the Bokmål/Nynorsk label. This is no different from Danish and Swedish, which also use old spelling and grammar in some example sentences.

Your attitude of "I was asked to follow a draft policy which nobody have agreed upon! Naturally, I would not comply" gives me the impression that you will by principle disobey any proposed policy. So perhaps I should just update it to recommend two headings, and you will voluntary start to use a single heading? This is now all about you and your attitude, and not about what's best for the language and Wiktionary. --LA2 13:32, 7 March 2011 (UTC)[reply]

No, it isn't. You keep dragging personal elements into this, also known as ad hominem. Stop doing it, and stick to the bloody arguments that are being put forward.
If someone creates a draft, then it does not automatically follow that people should follow it - it is a draft after all. What follows is debate - which is where we still are almost 3 years on. Very little argumentation has been but forward in this debate that explains what the problems with a three header solution are. What are actually the problems? A three header solution is the simplest way to accurately label the two language forms without having to create almost duplicated entries.
I am, by the way, worried by the fact that primarily non-native speakers of Norwegian have participated in the debate here so far. Njardarlogar 17:09, 7 March 2011 (UTC)[reply]

Good, let's start all over with the very basics. The drawback with having three different headings is that other articles link to pagename#Norwegian (either explicitly as [[pagename#Norwegian]] or through some template with {{...|lang=no}}) but suddenly the page changes and it no longer has that heading, but separate headings for Norwegian Bokmål and Norwegian Nynorsk, so all other articles linking to it need to be updated. The same happens when a red link goes to pagename#Norwegian_Nynorsk but the article is later created with a common heading for ==Norwegian==. Most mechanisms on Wiktionary assume one heading for one language. For example, that is how we organize categories and count statistics on how many words we have per language. With three headings we need to count Bokmål as the sum of Norwegian + Norwegian Bokmål and Nynorsk as the sum of Norwegian + Norwegian Nynorsk. The linking problem was what I tried to solve by introducing a template (top of this discussion) that created HTML anchors for all three names, so a link would always find something. All agreed that this was a bad solution and the template is now deprecated. All seemed to agree that we should go for either one (Norwegian) or two headings, but not three. I have repeatedly asked you why two headings are better, and all you have responded is that the existing draft policy is still a draft. This is not an argument for why two headings is better than one. The only consistent criticism is that you are against a single heading. That summary is not an ad hominem attack.

I agree that more Norwegians should enter this discussion. It's sad that so few do. I have tried to invite more people, and I suggest you do the same. --LA2 19:18, 7 March 2011 (UTC)[reply]

I have presented my main argument several times without it receiving any feedback; and it has nothing to with drafts. Yes, I am aware of the problem with linking. It may, however, be solved through other methods One problem, though, is that the string functions are not going to be implemented, making very simple tasks challenging. Regardless, people having to scroll down the page rather than arriving right at the correct entry is not the end of Wiktionary.

The second problem is also one of practicality, one that does not greatly affect the usability of the English Wiktionary (not to mention that if we only used one header, we would get huge counts for Norwegian compared to Danish and Swedish; and for no good reason, remember). What is certainly confusing, though, is the vague labelling at the inflection line that so many entries would receive with only one header. If someone not familiar with the fact that there are two written standards of Norwegian see (Bokmål) or (Nynorsk) at the inflection line, would they even know what the tag means, why it is there? If Bokmål or Nynorsk is in the header, it will become clear that we are dealing with two separate written standards.

To sum up: a complete separation is unpractical, whereas just one header is too vague. In my view, we must either separate them completely or partially. Njardarlogar 20:05, 7 March 2011 (UTC)[reply]

Fine then, do we have consensus for splitting Norwegian into two separate headings ==Norwegian Bokmål== and ==Norwegian Nynorsk==? It can certainly be done, since we already do separate headings for Swedish, Danish, and Icelandic. After the split, lang=nn will refer to Norwegian Nynorsk and lang=nb to Norwegian Bokmål. But what should we do about entries that refer to lang=no? Should that be treated as an error?

There are currently 6,500 entries for Norwegian, 80,000 for Swedish and 400,000 for Italian, so I wouldn't worry too much about flooding Wiktionary with too many Norwegian entries. If we get enough contributors, we should have half a million entries (including form entries) for each of these languages and maybe two million for Finnish (which has more inflected forms). --LA2 21:05, 7 March 2011 (UTC)[reply]

I think both could work, but it's certainly easier to split them (for any other Wiktionary than no.wikt). no.wiktionary.org is also the only Norwegian dictionary that doesn't split Bokmål and Nynorsk, every other Norwegian dictionary I know is either for and in one of the standards. So, I think you should do the same here at en.wikt.

I also think that you should use no and not nb as the language code. I think it will be the most logical for those who are not very interested in the politics around this, and just want to contribute (especially new users). Norwegian Bokmål is in many ways the mayor standard in Norwegian (for example, if you learn Norwegian as a second language, you will most probably learn this standard), and no is a code that many knows or can guess since it's a common code for both Norwegian and Norway. And also, nn is common for Norwegian Nynorsk and most people, especially those who contribute in Nynorsk, will know this code for Nynorsk. Mewasul 09:54, 8 March 2011 (UTC)[reply]

If we end up splitting them, we'll need an easy way to make sure that any Bokmål terms that also exist in Nynorsk get their own entries too. Maybe a bot could periodically check entries that have only Bokmål and make a list of them, so that any Nynorsk users can add the Nynorsk forms of those words as well? —CodeCa t 11:13, 8 March 2011 (UTC)[reply]

First: the codes that are used need to represent their languages, anything else is not NPOV in this context. Same goes for language names. Furthermore, the dialects would have to be treated as "Norwegian" - one would think.

There are quite a few words and spellings that are unique to either language form, so creating such a list would largely be useless; not to mention that its efficiency will continuously drop as time passes - if I understand your idea correctly. Njardarlogar 12:42, 8 March 2011 (UTC)[reply]

Is it possible for the bot to know in advance which words are shared by both languages? If it is, then it could use that to make its list. —CodeCa t 13:31, 8 March 2011 (UTC)[reply]

I'm not sure how we could a tell a bot that. Be also aware of that words that are both Nynorsk and Bokmål could be used much more in one of the language forms than the other one, meaning that the effort may be best be made on words exclusive to the language form. The fact that a lot of the most common/important words do already exist under "Norwegian" headers anyway, which simply have to be split up if we were to do this, further strengthens the idea. With all that said, it may well be possible and worth the effort if it is done the right way, but I don't know how that would be. Njardarlogar 18:08, 8 March 2011 (UTC)[reply]

I think you are mistaken, CodeCat. If we are to treat Norwegian Bokmål and Norwegian Nynorsk as two separate languages, then they don't need to be synchronized any more than Danish and Swedish. One Bokmål contributors writes entries for all kinds of fruit, another Nynorsk contributor writes entries for all kinds of fish, a third Swedish contributor writes entries for all kinds of bread. Isn't that how Wiktionary works? --LA2 01:42, 12 March 2011 (UTC)[reply]

I conclude from this lengthy discussion that two separate headings should be used: ==Norwegian Bokmål== and ==Norwegian Nynorsk== and with time all existing ==Norwegian== sections should be split up or changed into these two. I intend to update the existing draft policy Wiktionary:About Norwegian with this information. It will still be a draft policy until we decide to make it formal. But for now, the trend is going towards two separate headings instead of one common. --LA2 20:36, 14 March 2011 (UTC)[reply]

The outcome is pretty vague for now; I do not think it makes sense to touch older entries before we've finally settled for something (though of course, all the entries marked as Norwegian, but with no futher specifications need to be fixed; which is something that I have been working on for a while). I invited the users user:EivindJ, user:Kåre-Olav and user:Meco in an attempt to get more feedback. Njardarlogar 09:18, 15 March 2011 (UTC)[reply]

Sure it is vague, but we can change that by taking action. We need to lift Norwegian from being the 34th biggest language here to some more prominent position. Let's say Bokmål (just like Danish) should be among the 20 biggest and Nynorsk (just like Icelandic) among the 30 biggest. That is a huge lift for both variants, but we can do that. I'll track the statistics at Wiktionary talk:About Swedish#Statistics. --LA2 15:46, 15 March 2011 (UTC)[reply]

No italics for Latin/French/Japanese words?

Hello, I am here to put my two cents' worth of knowledge: In my native language, Spanish, we often put Latin/French in italics, but not Japanese, so I was shocked when I read the following sentence:"Loanwords and borrowed phrases that have common usage in English—Gestapo, samurai, vice versa, esprit de corps—do not require italics. A rule of thumb is not to italicize words that appear unitalicized in major English-language dictionaries." I would change the wording to "Some words do require italics" because for example, many people nowadays know what samurai is, but not what Gestapo or esprit de corps mean. Thus I will seek either removal or rewording of that section of the Wikipedia Manual of Style. — This unsigned comment was added by Fandelasketchup (talk • contribs).

WF? Mglovesfun (talk) 13:37, 24 February 2011 (UTC)[reply]

Probably. --Daniel. 13:40, 24 February 2011 (UTC)[reply]

Appendix:Elfen Lied, etc.

I redesigned some appendices of fictional terms to display lists of terms, their definitions, and their inflections. For example, see Appendix:Elfen Lied. --Daniel. 19:12, 24 February 2011 (UTC)[reply]

Deleting categories for derived terms

I would like to see the categories for derived terms deleted, and the concept discontinued. These are categories located in Category:English derived terms, such as Category:English words derived from: horse or Category:English words derived from: cube (noun). Related templates include {{derv}}. These seem to be a result of an initiative of DCDuring from September 2010. IMHO derived terms are better placed directly to the section "Derived terms", as is the prevailing practice. Many of these categories have very few members, such as one member. Anyone else feels the same? --Dan Polansky 17:43, 25 February 2011 (UTC)[reply]

The category Category:English derived terms is meaningless. Category:English words derived from: horse is not meaningless, but useless if the Derived terms sections is present. Lmaltier 17:46, 25 February 2011 (UTC)[reply]

I don't agree; these categories are relatively empty because they are new with few users working on them. It's not what I do however; for related terms/derived terms, I often point to a 'central' entry, for example for homeless I would do

====Related terms====
* see {{term|home|lang=en}}

Thus bypassing the category all together. Mglovesfun (talk) 17:50, 25 February 2011 (UTC)[reply]

I am not saying that they should be deleted because they have only few members. I am saying that they are redundant to the section "Derived terms", unless the section is emptied and its content is moved to a category. I want to see the list directly in the section "Derived terms"; I do not want to see the section emptied. --Dan Polansky 17:56, 25 February 2011 (UTC)[reply]

Sure, deleted 'em, won't bother me. Mglovesfun (talk) 17:58, 25 February 2011 (UTC)[reply]

The problem of the populating Derived and Related terms manually is that the result bears to no relation to entries that we have, containing both many red links and missing many entries that we have. Etymology-section templates could populate categories which would be available for use in creating derived and related terms lists on demand. The effort foundered because it surfaced the still-unresolved issue of what we mean by derived terms and, to a lesser extent, related terms. Is derivation a synchronic or diachronic process for our purposes in English, in synthetic languages, in poorly attested languages? As long as the conceptual issues remain unresolved, we should probably not allow any automated procedure to populate the Related and derived terms sections. Better we should let the varies judgments of contributors populate the section without guidelines and uncertain result, but slowly.

If a user wants to see the Derived terms directly in the entry instead of in a category, we have {{rel-top}} to hide the heavily populated sections from those who don't want the section to take up the whole screen. DCDuring TALK 18:18, 25 February 2011 (UTC)[reply]

For lots of Swedish words, I have used the templates {{compound}}, {{prefix}}, {{suffix}} and {{confix}} to explain how a word was put together. This goes under the Etymology heading. These templates categorize the page in e.g. Category:Swedish words suffixed with -sam, which is also featured under -sam#Swedish using {{suffixsee}}. But compounds are put in the huge and quite useless Category:Swedish compound words instead of separate categories for each component word. It is easy to imagine the alternative, that each compound component would get its own category, just like the prefixes and suffixes do. I don't believe this would be useful, however.

Now, words are formed in many steps. For example, (deprecated template usage) ofullbordad = o- + ((full + borda) + -d). All three steps (1. fullborda, 2. fullbordad, 3. ofullbordad) contain "full", but only the first step uses {{compound|full|borda}}. The second step uses {{sv-verb-form-pastpart|fullborda}}. The third step uses {{prefix|o|fullbordad}}. Thus, if {{compound}} were to create individual categories, only fullborda would show up in the category for words derived from full. --LA2 21:11, 25 February 2011 (UTC)[reply]

The order in which morphemes have historically combined within a language can give one answer: the diachronic one. There are problems with a lack of sufficient historical evidence in many languages. Or one could break a word into all the morphemes that someone could use to reconstruct it, possibly using only currently productive affixes, affixes that have been productive at some time in the language, or affixes that users analyze as having a meaning, though the affix has not been productive in the language. Any of these latter are a more synchronic approach. The synchronic approach is better for generating comprehensive lists of related terms. The diachronic approach is better for lists of derived terms, but could be pressed into service for related terms, albeit awkwardly. DCDuring TALK 22:49, 25 February 2011 (UTC)[reply]

As I often say, the question is not whether such terms exist or whether it is worth listing them, but whether it is worth listing them in a category. Mglovesfun (talk) 23:19, 25 February 2011 (UTC)[reply]

As I didn't want to repeat so close to my last, the issue to me is whether we trust the process of manual creation or related terms and derived terms lists:

They are often ridden with redlinks
But remain incomplete.
They sometimes have nothing whatsoever to do with a legitimate concept of what derived or related terms might be.
At other times they are simply quirky.

The existing derived terms lists can be systematically used to visit the blue entries to make sure that they have an etymology section with the appropriate templates. We can decide whether the redlinked derived terms should be retained as is, converted to black links, or deleted. The process of adding missing etymology sections will lead to ever more complete derived and related terms. The definition of related terms can be refined and made more complete to include all words with shared stems and generated from appropriate categories.

The process of manual construction and maintenance of such lists is a quaint one, which I doubt is being undertaken by very many contributors. It is because I have attempted it in a few cases that I would favor the gradual implementation of a category-based population of Derived and Related terms over the do-nothing alternative, which is what the manual approach amounts to. DCDuring TALK 23:49, 25 February 2011 (UTC)[reply]

It would really be nice if we could automatically generate lists on a page based on the contents of a category! But I don't think there is an easy way to do that... —CodeCa t 00:25, 26 February 2011 (UTC)[reply]

The templates {{prefixsee}} and {{suffixsee}} are quite good. --LA2 13:23, 26 February 2011 (UTC)[reply]

Manual collection of derived terms has worked fairly well for me, with the help of the what-links-here function and the search function. I have populated many sections of derived terms in the past. I was not alone in doing so; I have seen many lists created by other people. From what I can see, the use of {{derv}} does not significantly simplify the process of identifying derived terms: you first need to know to which entries to add the template. The manual process can be enhanced using a bot, without the need to resort to categories. The manual process does not in any way amount to "doing nothing"; it is the easy way of setting up new technological devices and then waiting that amounts to doing nothing. If you say that the categories have proved very helpful to you, can you point me to several such categories that you have populated and that were not already populated before in the Derived terms section? --Dan Polansky 18:23, 3 March 2011 (UTC)[reply]

I stopped after experimentation due to the lack of interest in clarifying the meaning of related and derived terms. In the absence of such clarification, there is much to recommend a hobbled manual process of populating such headings. I would think we would not want, for example, to alter {{compound}}, {{suffix}}, {{prefix}}, and {{confix}} to autopopulate the relevant categories. Such a modification would quickly (enough time to populate the missing category list) provide us with ample material to compare against our manually created derived- and related-terms lists. DCDuring TALK 19:42, 3 March 2011 (UTC)[reply]

(unindent) Can you point me to several such categories that you have populated and that were not already populated before in the Derived terms section? Or is it true that there are no categories that demonstrate the usefulness of the technique that you have introduced? --Dan Polansky 09:06, 7 March 2011 (UTC)[reply]

As a follow-up, I have tagged {{derv}} as in dispute. MG has sent it to RFDO, so the discussion and voting follows there: WT:RFDO#Template:derv. --Dan Polansky 16:04, 17 March 2011 (UTC)[reply]

Disallowing templates that need languages from defaulting to English

Right now, there are quite a few templates that automatically assume the language at hand is English if nothing else is specified. I think this is a bit biased, but that's not really my point in this case. If the default is English, forgetting to specify the language will inevitably mean that the entry gets added to an English-specific category. This is of course not what we want, but because of the default option it's very hard to catch errors like that unless someone happens to spot the out-of-place entry. So for practicality alone I think it would be better if this default behaviour is removed, and an explicit lang=en is needed.

Linked to this proposal is the naming of topical categories. Currently, English categories do not get a language prefix, but other languages do. I think it would be better if English topical categories were prefixed with en:.

There are some templates such as {{term}} where the language is optional, but the default is not English but a generic case for all languages. This proposal does not affect those, such templates can keep working as they always did. —CodeCa t 14:56, 26 February 2011 (UTC)[reply]

To illustrate the above, 黃道吉日 is currently in Category:English idioms as it uses {{idiomatic}} with no language given. Mglovesfun (talk) 15:00, 26 February 2011 (UTC)[reply]

I think that operating like {{term}} is ideal, I agree that defaulting to English is rarely the best option. There may be cases where it does make sense, but in general I agree. - [The]DaveRoss 15:03, 26 February 2011 (UTC)[reply]

Not all templates can work like {{term}} though. There are some templates that add pages to a category, but what category should they add pages to if they don't know the language? In that case, some kind of requests category would be better than English. —CodeCa t 15:15, 26 February 2011 (UTC)[reply]

Ones that really require languages could default to a "needs a language" category. For cleanup. - [The]DaveRoss 15:18, 26 February 2011 (UTC)[reply]

It is trivially easy to go to the end of any English category and identify a large number of items using other scripts that are misclassified. We also have had bots that have generated cleanup lists for items misclassified in many ways, including this. Why is this proposal necessary to solve the stated problem. This is the English-language Wiktionary for which we still have hopes of recruiting English native contributors to expand our coverage of terms and update our creaky Webster 1913 entries. Making it easy for them seems like a good thing. DCDuring TALK 18:22, 26 February 2011 (UTC)[reply]

A reason for using the prefix en: is mentioned here. - -sche 19:48, 26 February 2011 (UTC)[reply]

IMO {{context}} tags, specifically, should not require lang=en, as it takes up pre-definition real estate on the definition line, and KassadBot or a new bot should be employed to add lang to tags in non-English sections to prevent/remove miscategorization as English. However, IMO most other templates can and perhaps should do as proposed by CodeCat in this section.—msh210℠ (talk) 15:12, 27 February 2011 (UTC)[reply]

I've created a vote on this now, since it would probably best to get a proper consensus before this is made policy: Wiktionary:Votes/pl-2011-03/Default language of templates that require a language —CodeCa t 12:33, 11 March 2011 (UTC)[reply]

Pronunciation for Word of the Day for 2011-02-27 ("endeavour")

The audio file for today's Word of the Day, "endeavour", rhymes with "beaver". The audio file should be removed, or the audio file on that page should be changed to (the pronunciation from the page for "endeavor"). 68.9.112.195 00:58, 27 February 2011 (UTC)[reply]

I see you've figured out how to do that yourself. :-) —Ruakh_TALK 01:17, 27 February 2011 (UTC)[reply]

Wiktionary:Word_of_the_day/February_27 is protected from editing, so I wasn't able to change the audio file for that page. I was able to change only endeavour because it wasn't protected. The parameter "audio=en-us-endeavor.ogg" should be added to the end of the wotd template on Wiktionary:Word_of_the_day/February_27. 68.9.112.195 01:36, 27 February 2011 (UTC)[reply]

Done —Internoob (Disc•Cont) 04:55, 27 February 2011 (UTC)[reply]

We really need to try harder with regards to WOTD, there have been lots of them recently which have had pretty glaring issues. Unless a word is reasonably complete and reasonably accurate it should not be made WOTD. Don't forget that for a number of people the WOTD is their portal to the project, it is the first thing they see, and if it is wrong it does not build confidence in the project as a whole. Wikipedia does this one a lot better than us, and while I don't think we need as complex a procedure as theirs I do think we could do a bit more to ensure that the terms which are published in this way are as good as they can be. - [The]DaveRoss 17:30, 28 February 2011 (UTC)[reply]

In this particular case, this issue was that I added the word as WOTD and could not play audio. In general, a good practice (besides helping out on the WOTD project itself, which I understand most people won't want to do) is as follows IMO: Someone who anyway checks, or can check, pronunciations (or etymologies, or whatever) should do so for upcoming WOTDs. (The March entries are at [[Wiktionary:Word of the day/Archive/2011/March]] (or the appropriate year and month). But the way WOTD templates work, (for example) March 11 template includes last year's word until someone updates it with this years. So don't bother checking words on the March page that have not yet been updated for 2011, as they're last year's WOTDs. The status, indicating which is the last template updated, is at [[Wiktionary:Word of the day/Status]].)—msh210℠ (talk) 17:54, 28 February 2011 (UTC)[reply]