Wiktionary:Beer parlour/2007/December

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.

Beer parlour archives edit

2024

2023

Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

December

Account necessary before creating new pages

From watching recent changes, it seems like most of the pages created by anons are deleted within 10 minutes of being created. It seems to me that we should do what wikipedia does and require an account before creating new pages. This is just my biased opinion, of course, so if anyone thinks this is worthwhile, someone should probably systematically go through new pages created by anons and figure out how many of them get speedy deleted vs how many of them stay. Then we could decide whether to require accounts for new pages. Anyone up for this? I wouldn't know how. Nadando 23:59, 30 November 2007 (UTC)[reply]

As much as I agree with you, you're stepping on a hornet's nest when you get into this area. The question of whether or not to require registration for creating or editing pages has plagued Wikipedia and sister projects since the very beginning and is probably just as contentious today as it was back then. I think even the WP policy came into being only after the John Seigenthaler Sr. Wikipedia biography controversy embarrassed WP into requiring it. Globish 00:06, 1 December 2007 (UTC)[reply]

I am not against anons editing pages. I think a lot of translations and good content comes from people without accounts. I just think that most new pages created by anons get deleted soon after they are created. Nadando 00:10, 1 December 2007 (UTC)[reply]

I've seen plenty of decent entries created (or at least started) by anons. I'm not sure that banning anon page creation would do anything but push vandalism to existing pages. bd2412 T 00:35, 1 December 2007 (UTC)[reply]

Why don't folks open accounts? It seems very low risk. Is it a problem of them not knowing how? Of being under the sway of the mythology of Wikis being open for anyone to edit so that they just start editing? Of thinking Wikitionary is or ought to be covered by their WP account? I certainly found it rough going in my first couple of days here, because I made assumptions about the rules here being just like at WP, but I opened an account almost immediately. DCDuring 00:47, 1 December 2007 (UTC)[reply]

For the anon contributions I've seen, most drop in just once, create a single page (or edit a single page), then never return. For someone dropping in to make a single contribution it's really just wasted effort to select a user name, set up an account, select a password, etc. I expect it's also a function of many contributors speaking English as a second language (if at all). If you don't speak English, then the instructions are just so much gibberish. --EncycloPetey 23:39, 2 December 2007 (UTC)[reply]

I'm sorry to say that I would have to object to this very strongly unless we were willing to restrict anonymous IPs on all edits. You're not going to eliminate vandalism by prohibiting page creation. If anything you're going to concentrate the vandalism within existing entries, and preserve it in the history for eternity.

What's more worrisome is stuff that isn't vandalism, the positive but poorly contributed information that really belongs on another page, but ends up cluttering the wrong page. This already happens and it would get a lot worse if red links couldn't be filled in. Besides that, there's stuff that's well intentioned but doesn't belong here, and frankly I'd rather see a questionable entry than a bunch of questionable synonyms or the like.

Wiktionary is not like Wikipedia in that it's a lot more sparse. Each page is a short dictionary entry, not a long encyclopedia entry, and the content doesn't break off of main entries like that of our sister. Anyone who can contribute has to be able to make new pages. DAVilla 03:06, 1 December 2007 (UTC)[reply]

I think it should not be a policy. We all seem to handle this sort of thing quite fine. When I first started I didn't create a login name because of laziness; why would I bother if it is unnecessary to do so? After realizing I would be doing this for a while I signed up. If someone is forced to create a username, sure that could stop new page spam, but it could also prevent potential new users from staying on board. I would recommend keeping it the same as it is now. sewnmouthsecret 04:45, 1 December 2007 (UTC)[reply]

I agree, it does make a little more work for us, but we really don't want to scare potential contributors away. The learning curve here is steep enough already without adding any extra hurdles, and it is not as though we have too many editors yet. Conrad.Irwin 10:13, 1 December 2007 (UTC)[reply]

Excellent points by Globish, bd2412, DAVilla, Sewnmouthsecret and Conrad Irwin. I'd like to add further that the high rate of deletions among anons doesn't necessarily correspond to a high rate of vandalism. Some editors are very delete happy and will delete a word without putting it on RFD or checking books.google.com even when there are hundreds of results at books.google.com. It's often necessary for me a remake an entry and explicitly make a discussion page saying "Don't delete this without RFD, it's supported". Editors have this idea that if it isn't in their personal lexicon, it must be vandalism. Language Lover 12:19, 1 December 2007 (UTC)[reply]

LL, when you enter vandalism or otherwise questionable terms, you are supposed to supply citations. Your exaggerations (especially regarding b.g.c.) are unhelpful. Trying an end-run around the process by claiming that your bogus-entries shouldn't be questioned, is completely unfounded. There is a citations: namespace specifically for collecting relevant citations - but you suggest that garbage should be taken on faith as being legitimate? --Connel MacKenzie 05:42, 3 December 2007 (UTC)[reply]

First, I take exception to your choice of wording, implying that I enter vandalism. You are using presuppositions left and right, wording your attacks as though the things you're arguing for were already proven. A common Connel MacKenzie tactic. Anyway... The citations requirement should be for cases where citations are actually rare. Why should an editor need to take lots of time to carefully format the citations when there are 600+ entries at b.g.c? If someone were to delete cat saying, "nonce word, no citations", would you insist that it be heavily cited before restoration? If there are lots of hits at b.g.c. but you think they are *invalid*, then that's another matter, a perfect time to RFV-sense or RFV or RFD it and *explain*, saying something like: "The first twenty hits at b.g.c. don't actually support this sense." I mean come on, I know you are a reasonable guy :) Language Lover 16:19, 3 December 2007 (UTC)[reply]

Actually, I did review your deleted edits before saying that. "Proven?" When noise like this begins, sorry, but the usual suspects are at it again. (Note that DAVilla started this in direct response to my pointing out a particular VOTE problem.) Anyhow, yes, what I said is true - the Wiktionary policy is certainly gray in that realm, by without any doubt at all, "questionable" terms are supposed to be cited when entered. If previously deleted, they are not to be re-entered without at least three valid book citations. --Connel MacKenzie 07:19, 6 December 2007 (UTC)[reply]

Behold, this evildoer is a VANDAL of Wiktionary! He sulks deep in a basement and squanders his time entering nonsense just to thwart Connel MacKenzie!

If previously deleted in failure of RFV within the last year then they may not be re-entered without three required ~~book~~ citations. DAVilla 04:36, 8 December 2007 (UTC)[reply]

That comment doesn't make sense to me. How can you supply citations when entering vandalism? DAVilla 13:11, 4 December 2007 (UTC)[reply]

That isn't what I said. --Connel MacKenzie 07:19, 6 December 2007 (UTC)[reply]

I've had at least one entry I created (skull-fuck) deleted without an RFD, even though it had citations, with the comment "no legitimate use". So I do think it's possible some editors might be a bit trigger happy. --Ptcamn 08:31, 3 December 2007 (UTC)[reply]

You are incorrect...there still are no valid citations for that (and that so-called trigger happy deletion was when we still used RFD more than RFV. The restoration is the only apparent error.) --Connel MacKenzie 07:19, 6 December 2007 (UTC)[reply]

Did you hit refresh on your browser? There are a ton of cites for skull-fuck. Not in the citations namespace but on the page itself :P Language Lover 07:52, 6 December 2007 (UTC)[reply]

Not to add to the controversy, but wasn't that in Officer and a Gentleman? -- A-cai 12:49, 8 December 2007 (UTC)[reply]

Yes, I found it. It's about 13 minutes in, when Louis Gossett first talks to Richard Gere. -- A-cai 13:02, 8 December 2007 (UTC)[reply]

What exactly is invalid about the citations? --Ptcamn 00:45, 14 December 2007 (UTC)[reply]

Recent changes to Template:RFV

Someone recently edited Template:RFV and added a "note to poster" to the RFV box. It's the 01:11, 1 December 2007 revision. (You can't see it at the template page because it's includeonly. So just look at some RFV'd word.) Can we undo that change please, it doubles the size of the already large RFV box and it isn't info which is at all relevant to our userbase. What's more, I think the overwhelming majority of people who place RFV tags are wiktionary editors. Such a "note to poster" would be better off being spread among the editors, not put as part of the RFV box. Language Lover 11:19, 1 December 2007 (UTC)[reply]

Wiktionary Day and Main Page redesign

Several contributors have been working on a new design for the Main Page; they want to get it in place by Wiktionary Day.

Essential problem is that not enough people have had the time or interest to look at it, (especially during holiday, when some have more wikitime, and some less ;-), so it is hard to make what is something of a "policy" change. And there is just about a week left.

I kinda stomped (well, did stomp) on the attempt to start a "policy" vote shortened to (then) 10 days; so I propose this: we have a seven-day vote on using the new design temporarily, showing it off on Wiktionary day, and then run a more "normal" vote/discussion on keeping it, or going back, or wherever.

So: see Wiktionary:Votes/2007-11/Main page redesign and I encourage you to support it. Robert Ullmann 23:38, 1 December 2007 (UTC)[reply]

WOTD

In the Archive, it doesn't list December 2007. Who is handling this now? Also, the words for December don't have the WOTD template in their respective pages. I can add those if need be. sewnmouthsecret 18:14, 4 December 2007 (UTC)[reply]

Although the "Archive" page for December hasn't been set up yet, the "Recycled" page is still there, and is listing the new entries for 2007 as I put them in. I'm still keeping up with new WOTD entries (just!), but have been involved in non-Wiktionary activities the last week and a half that have taken all my time. The {{was WOTD}} template is added irregularly, often in batches. Connel was doing that for a long time, but seems to have taken a much-needed wiki-break. If you'd like to add those templates, that would certainly help. Just be sure to get all the necessary parameters in (you can look at any previous WOTD page that has the template to see what is required). There may also be some WOTD from the end of November that don't have that template yet. I did most of them last month (usually in groups of five), but life kicked in the weekend before Thanksgiving. I've got almost all the December WOTD entries selected now, but need to set up the Christmas Contest and take care of non-wiki matters. I will stay a couple of days ahead on WOTD, but won't feel fully back on top until near the end of this month, when I hope to be a month ahead again like I used to be. --EncycloPetey 04:32, 5 December 2007 (UTC)[reply]

The {{was WOTD}} template has been added to all missing entries. sewnmouthsecret 15:47, 5 December 2007 (UTC)[reply]

Thanks! --EncycloPetey 00:58, 6 December 2007 (UTC)[reply]

Negative prefix word entry automation

I note that words beginning with the various negative prefixes ("a-", "anti-", "in-", "non-", "un-", et al.) seem to constitute a large share of the words on various to-do lists at Wiktionary. I have two questions:

Would it be beneficial to have a template or program to create starter entries for such terms, defining as the appropriate flavor of negation for the word they prefix and leaving a specific clean-up marker?
Would it be beneficial to do something similar for alternative spellings involving hyphens, such as "nonsecular" and "non-secular", which occur with equal frequency in b.g.c.? DCDuring 18:37, 4 December 2007 (UTC)[reply]

Tbot-level automation is possible, but would just put the words on a different cleanup list; fully half the "non-" words I just did had some sense besides the obvious "not X", and picking out the "lack of X" nouns from the "not X" adjectives" isn't trivial. That said, Tbot entries are better than redlinks. By the way, after the first few entries, I was pasting an article framework (L2, L3, en-adj, # not [[) Cynewulf 18:59, 4 December 2007 (UTC)[reply]

Ditto to the basic framework. sewnmouthsecret 19:30, 4 December 2007 (UTC)[reply]

Showing regional differences in translations

There might already be a better set way of doing this, but sometimes regional variations in usage are not always reflected in translations. For example, coccinelle is defined as "ladybird", but it would be useful to Wiktionary users, especially non-native speakers of English, to see somthing resembling:

ladybug US, ladybird UK

Larger writing in brackets, which can be seen for example at couleur, could be streamlined visually:

colour UK, color US

There could be a template with options for US, UK, Canada, Aus. and so on, I just raised these simple examples because I am familiar with those US and UK usages. I copied the way of doing it from the furigana template at the Japanese Wiktionary. Pistachio 23:04, 4 December 2007 (UTC)[reply]

I raised this question a while back with regard to the entry 奶嘴. I have modified the entry to reflect what I think you are suggesting. Opinions? -- A-cai 23:24, 4 December 2007 (UTC)[reply]

The concept is intriguing. I do have one concern about the formatting. We use superscript text in the translations section to link to other wiktionaries, and the code so displayed is the ISO language code. UK is an ISO language code, and I imagine that similar cases could occur. There is a possibility of confusing the user with these. I also wonder, will we add: ladybug US, Canada, ladybird UK, Aus, NZ, SAfr, India and similar listings? That becomes cumbersome if we don;t work out how to handle such cases up front. I'd rather not see us locking into the idea that English is restricted to the US and UK, forgetting that many other countries use English as well. --EncycloPetey 04:24, 5 December 2007 (UTC)[reply]

The 奶嘴 entry has an image identifying the object, which makes such listings basically superfluous, right? I mean if there's a picture showing you what it is, is it really necessary to list regional colloquialisms? The spellings (like color vs colour, center vs centre) aren't so different that one would be lost without the superscripts, either. </2cents> — [ ric | opiaterein ] — 13:22, 5 December 2007 (UTC)[reply]

Bravo User:Pistachio - this does look promising. EP points out one quirk that should be dealt with. But offhand, I'm not sure why differences that are traditionally US/UK differences shouldn't be identified as such (getting more specific only when truly necessary and helpful to do so.) The general notion of including more specific region information is good - I'm just not sure it fits your proposed scheme. I can picture the more detailed info being stuck in a usage notes section, perhaps. --Connel MacKenzie 08:07, 6 December 2007 (UTC)[reply]

Thank you to A-Cai for the example :-D . I agree that, as Opiaterein/ric mentions, small spelling differences are mostly intuitive to native speakers of English. However, spelling conventions are not always obvious to learners of English, and all users (native and non-native speakers) might be confused by regional differences in vocabulary (not colloquialisms). Although pictures are helpful, some concepts would be awkward to illustrate with a picture, and some users will still need to navigate to "ladybird" to find out that "ladybirds" are also known as "ladybugs". I think it is more helpful and accurate to present such information on the translation page for the word which was looked up, without a user having to then navigate to another page for more information. As for the issue of confusion with links in the translations section other wiktionaries, I had thought it wouldn't be, as long as the tag was not presented as a link as well. Perhaps adjusting the format such as removing the brackets, bolding, using different coloured text or fonts would clarify the difference.

A further application of this concept would be to tag words with double meanings such as "cock", "ass", "pussy", "bitch" and so on to alert non-native speakers, but maybe that's an idea for another time ;) Pistachio 01:34, 10 December 2007 (UTC)[reply]

Alternative spellings policy

a draft

In English, when any two terms have the same definitions and would be pronounced the same in a given dialect, they are considered complete alternative spellings of each other. Terms that have any senses applicable to one spelling and not the other are alternative spellings only in their common senses. Terms that reflect pronunciation are dialectical spellings since they do not exist in another dialect. In the latter cases, or in the case of a pronunciation split between the spellings in a given dialect, a full entry is permitted for each spelling (or at least for a complete alternative spelling of each). Only when both meaning and pronunciation are completely shared may the two pages be merged, unless decided otherwise by consensus, such as for letter variations between the prominent American and prominent British spellings.

Alternative spellings should link to each other. If they do not, and if the two terms are identical in meaning, then this policy is not enforced until the split is discovered and the content is merged. Once that happens, only the prominent spelling should have a full entry. The prominent entry may be decided by consensus on a case by case basis.

By default, the prominent spelling is deemed to be the one formed from the 26 letters of the English alphabet, but preferring characters from borrowed terms when the phonetical purpose is widely understood, particularly ñ (n with tilde) and word-final é (e with accent), and standard hyphenation that avoids double vowels, as in re-entry. Characters, particularly ç and č (c with cedilla or caron) and ã (a with tilde), whose phonetical purpose is well understood by educated English speakers, but not widely, carry equal weight with thier normalizations. Other characters that are commonly seen in English texts are allowed if necessary, particularly ü (u with umlaut) and any of the five vowels with the accent mark. These are not generally understood because of conflicting rules in various languages. Neither any other diacritical marks (including the grave mark and circumflex) nor ligatures are preferred in determination of prominent spellings.

If the spellings cannot be distinguished on this basis, for instance for differences in spacing, then a survey of ~~internet~~ sources such as Google Books should be used, or the question left open to a future contributor if unclear, or asked of the community if debated. In the case of no consensus, the default applies, or if there is no default, full entries are permitted and merger is not allowed.

This has the effect of making program and programme full entries while preferring El Niño over El Nino, ~~façade over facade, résumé over resumé,~~ and Quebec over Québec, the other ~~four~~ being stub pages. Spellings such as omelet and omelette would require a little snooping around. In any case a community decision would resolve disputes. One thing I'm conflicted on is if realise and realize should be separate, which I've done with italic text. I wish it could be one entry, but I don't know how to decide which, and I know that the community doesn't either.

Of course, I expect that any of this could change substantially before and if it's ever sent to a vote. DAVilla 00:35, 5 December 2007 (UTC)[reply]

Edited: letter variations.

Edited: + word-final é, ã (a with tilde), grave mark, circumflex DAVilla 23:16, 6 December 2007 (UTC)[reply]

Edited: no consensus, the defaul applies, or if there is no default, DAVilla 10:42, 6 December 2007 (UTC) DAVilla 23:27, 6 December 2007 (UTC)[reply]

Edited: and standard hyphenation that avoids.... Other characters.... These are not generally understood because of conflicting rules in various languages. DAVilla 08:28, 8 December 2007 (UTC)[reply]

Edited: for instance for differences in spacing DAVilla 08:42, 8 December 2007 (UTC)[reply]

The easiest thing obviously would be to have one article with all the information so that we wouldn't have to add the same information to multiple places. Of course, this brings up the "preferential" war. This is a touchy subject and probably always will be. — [ ric | opiaterein ] — 01:29, 5 December 2007 (UTC)[reply]

Sweet, good work DAVilla, I like I like. I've done some work on trying to fix this policy before, so I appreciate how much deliberation is needed. Great work keep it up. Language Lover 01:52, 5 December 2007 (UTC)[reply]

I'll have to give this some serious thought. One issue I think we should settle first, though, is what exactly the L3 page section will be called (it doesn't have to match the current in-line {{alternative spelling of}} template). That discussion was opened, but never settled. I like to see it resolved before we try to take the next step. --EncycloPetey 02:42, 5 December 2007 (UTC)[reply]

I like simple, common sense answers to things. I feel having a seperate entry for each spelling is best, with each entry having full definitions and referring to other spellings. To me, it makes little sense to merge alternate spellings onto one page. sewnmouthsecret 02:50, 5 December 2007 (UTC)[reply]

The difficulty then is the pages go out of synch, or you have to keep them in synch at great tedium. Language Lover 02:52, 5 December 2007 (UTC)[reply]

But they shouldn't always be kept in synch in every case. Consider British (deprecated template usage) kerb /kɜːb/ and US (deprecated template usage) curb /kɝːb/. Not only is the spelling different in each region, but the pronunciation is different, so the pronunciation is essentially particular to the spelling. There are other cases of this where the spelling change isn't so radical, but the pronunciation still differs (such as any word with a long o where the spelling differs). In short, going to synched pages will not work in all cases. It might work for some, but not all. --EncycloPetey 03:12, 5 December 2007 (UTC)[reply]

Firstly, the pronunciation is identical. The difference here is just phonetic: [ɜː] always corresponds to [ɝː]. Every single vowel differs a little or a lot between dialects, and even between individual speakers of the same dialect. We just happen to write them using different symbols for this particular example, but if we wanted to get detailed we could do it for every single vowel. So-called "[ʊ]" invalid IPA characters ("[]") is not the same in American English as in British, and so forth.

Secondly, there are British people who say [kɝːb], and Americans who say [kɜːb]. Neither is a monolith, but includes many dialects, including both rhotic and non-rhotic ones. --Ptcamn 11:46, 5 December 2007 (UTC)[reply]

And there are Brits who spell the word (deprecated template usage) gray and Americans who spell it (deprecated template usage) grey, but we don't make a fuss over that now, do we? --EncycloPetey 14:19, 5 December 2007 (UTC)[reply]

Are you suggesting that we just pretend that AAVE, Irish English, NY English, West Country English and so forth, don't exist? --Ptcamn 15:08, 5 December 2007 (UTC)[reply]

Your wording indicates that you are suggesting that. But perhaps you just worded it unclearly? --Connel MacKenzie 15:15, 5 December 2007 (UTC)[reply]

Yes, probably unclearly. It's the difference between broad and narrow transcriptions. But see below: either argument is irrelevant to the proposal. DAVilla 04:15, 6 December 2007 (UTC)[reply]

Slow down folks, we don't need to argue this one! The wording states the the pronunciation must be different "in a given dialect". Of course pronunciation is going to vary between dialects, but do the two splinter in any single dialect? As an American, I would pronounce kerb and curb identically. I would imagine that a person in the UK would read kerb the same as curb, so this does not qualify two full entries on those grounds.

The relevant question more simply is if they have all identical definitions, and they do not. So without even considering regional differences, they are each automatically granted full entries. DAVilla 04:13, 6 December 2007 (UTC)[reply]

So let's look at an example where the spelling and pronunciation are different but (as far as I know) the definitions are the same. US has (deprecated template usage) moralize ((General American) mɔːrəlaɪz), while the UK has (deprecated template usage) moralise ((Received Pronunciation) mɒrəlaɪz). How do you envision such a case would be handled? Please set up a demo page if that helps to illustrate what you mean. --EncycloPetey 04:22, 6 December 2007 (UTC)[reply]

Great example, but I don't see how the pronunciation argument is any different. As an American, I would pronounce either as mɔːrəlaɪz. I imagine a person in the UK would likewise read moralize the same as moralise. So they are not granted separate entries on the grounds of pronunciation.

What is different in this instance is that the definitions are identical, as you point out. I had originally commented above that I'm conflicted as to whether such cases should be separate. Certainly it would be possible to have a single page and to put all pertinent information there, including both pronunciations. However, this is a contentious issue, and I do not believe it productive for the community to be asked to pick a single title of the two. That's the reason for the italic text. If we were to keep it, "variations between the prominent American and prominent British spellings" would prohibit the two pages from being merged.

It is specified that alternative spellings should link to each other, but not how. I will leave your example for you to demonstrate a preference in this case, if you should wish, and I ask others to respect it while this discussion is in progress. DAVilla 04:49, 6 December 2007 (UTC)[reply]

It might be worthwhile to start assembling a list of the different categories of cases with specific examples. I think a big part of the confusion is that there are so many different cases being considered without trying to figure out how they're different from each other. It wouldn't have to be a long list—just enough to show us what we're dealing with. That way we'll be able to determine what problems we face with each case, and to throw around ideas how to deal with them. --EncycloPetey 05:01, 6 December 2007 (UTC)[reply]

One small sub-set would be US vs. UK (thanks to several people who helped identify those groupings.) Another would be diacritics. Another would be hyphenation. Another would be compound words vs. separate word phrases. Likewise Category:India English, etc. Another would be Special:Prefixindex/re-e + a + i + o + u vs. Special:Prefixindex/ree. The larger problem is that the premise is wrong: there should be no brand new policy taking the opposite direction of the past several years...the separate spellings should be further distinguished from each other, not commingled. --Connel MacKenzie 07:51, 6 December 2007 (UTC)[reply]

But I'm not sure that all the US / UK words differ in just one way. I'm think of how we want to handle situations based on, say, just the spelling differs regionally without differences in pronunciation or meaning. Or, perhaps there is a situation where only the pronunciation differs significantly. Some of these situations can easily be handled by current format standards, and it would be nice to identify those patterns as well. Finding these patterns would allow us to determine just how much of a problem we are looking to "solve". It may affect a relatively small number of entries, or it may affect thousands. Until we determine what we're trying to "fix", we're likely to go round and round arguing standards in a vacuum. I think calrifying the issue will be beneficial no matter which way we decide to go. --EncycloPetey 14:43, 6 December 2007 (UTC)[reply]

DAVilla, you seem to be pushing a point now, in direct conflict with previous consensus, to support your favored candidate on an issue that you now wish to see resurface. I think you are behaving in a very irresponsible manner. The hackneyed wording above makes it very clear that you can't quantify what exceptions you are trying to make. Different spellings = different entries. That is and has been policy. Look at some of the responses you've generated here - even Ptcamn has missed the issue, recommending instead to obscure subtle differences in IPA notation, rather than indicate (helpfully) which pronunciation applies to which region. If anything, policy could be enhanced to emphasize that different spellings must be encouraged to flourish as complete, separate, divergent entries and that syncing entries is almost always wrong. --Connel MacKenzie 15:14, 5 December 2007 (UTC)[reply]

Yes, I am pushing a point, if that is what discussion of the topic amounts to. What you missed is the point I am pushing. What I am primarily trying to do is to codify current practice and to establish channels for resolving disputes. I am tired of the flame wars.

We do not and cannot have completely duplicated entries for every alternative spelling. Discussion has always focused on the contentious ones, but remember how widely this issue umbrellas. Would you rather we RFD Template:alternative spelling of than use it on encyclopædia and coöperate? At the moment tea-cup and tea cup simply redirect to teacup. Why hasn't anyone raised hell about that?

What I am trying to do is to first of all isolate those cases that are contentious. Gray and grey would have full entries if we agree that prominent American and British spelling differences should, which as I understand it is the community opinion as well as yours. In cases that don't matter so much, like fire fighter v. firefighter, one clear way to determine the prominent entry is with a little investigation. I've gone beyond that in interpreting some trends. If my wording is "hackneyed" then it's because the selection of a main entry is complicated. At the moment, El Nino and resumé are alternative spellings, and Québec doesn't even have an English section. Why that preference?

The simplest explanation I can find is that the tilde and cedilla are understood—yes, in English—and even expected in certain words. I reasoned that the accent mark is known, though not expected over semivowels, because of the influence of Spanish and French in the US and UK respectively. Likewise the umlaut is known from Spanish and German, though only considered over u since others are a much rarer find in English. However, unlike the consonants, the purpose of these diacritics is not understood over vowels. I could guess at several reasons for this: the purpose is inconsistent between languages; pronunciation rules in English are often inconsistent; vowel sounds vary greatly between dialects. Regardless, it seems that while ñ and ç are acceptable for borrowed terms, the accent marks are more readily dropped. Likewise ligatures, more commonly of vowels (æ and œ), are regularly converted.

Therefore, by default, El Niño is preferred over El Nino, façade over facade, allowing the infliltration of funny characters. At the same time, Quebec is preferred over Québec since the accent is not stricly necessary. Likewise resume is preferred over resumé and résumé. However, resume is not a complete alternative spelling both because it has other senses and because it has an alternate pronunciation. Hence one of resumé or résumé may have a full entry. Since the accent is necessary in this case, it is allowed, and résumé is preferred over resumé. Like I said, it's complicated.

Now remember that this would only establish a default. The key is that consensus on any individual term could overturn this by designating another as the primary or by requiring duplicate entries. In the examples above, another possibility is that facade is the preferred spelling in the US. If you could demonstrate that to the community then the rule in italics would allow full entries at both. DAVilla 04:27, 6 December 2007 (UTC)[reply]

I can think of two other uncommon diacritics that have certain fixed uses in English: the háček from some Czech and Slovak words, and the o-tilde that occurs in some place names of Portuguese origin. They are rare in English, but some words typically have them. (You almost can't spell háček without one!) --EncycloPetey 04:34, 6 December 2007 (UTC)[reply]

I haven't seen that enough to draw any conclusions. On a few words it appears that you are correct, but remember that for individual terms we can always bring the issue to forum. I do like having more examples to draw from though. I've recently realized that the purpose of a final accented e seems to be understood, as in café and cliché. DAVilla 09:47, 6 December 2007 (UTC)[reply]

DAVilla, all the example you gave are misleading or inaccurate. Tea-kettle spillover caused invalid redirects for tea cup et al. Résumé obviously is the non-naturalized French spelling, while resumé obviously reflects the English pronunciation (in deference to resume.) El Nino is somehow attested, in and of itself. I for one, have never noticed the accent in Quebec before (my wall map uses both spellings - with and without the accent.) And having supplied ample evidence for facade three times now, it is clear the RFV nay-sayers will never listen to reason...I rarely do more on RFV, than just submit new items now because of that.

You know that dictating a "default" usually prevents reasonable expansion later on. Have we learned nothing from previous mistakes? Every example you list, itemizes something (or some things) that need further expansion and differentiation. Not less! The wording you gave above implies that many entries should have content removed and replaced with stubs. That is opposite the long-term automation goals for assisted expansion of exactly that class of entry. --Connel MacKenzie 07:39, 6 December 2007 (UTC)[reply]

A few points of clarification. I'm not suggesting the deletion of El Nino, but being "somehow attested" doesn't amount to much. RFV has never been taken to indicate that an entry should be a full article. A lot of common misspellings could easily pass RFV. As a proponent of proper spelling, I'm sure you understand. Some entries are going to be stubs, pure and simple. The question is where to draw the line. Tea cup and tea-cup both get Google book hits, so I do think it's an accurate example. If not, there are many more alternative spellings that differ only in spacing or hyphenation.

Excepting the most recent battle, the articles I mentioned above are already stubs, so there wouldn't be much in the way of replacement. Honestly it appears that when there are complete alternative spellings, identical in every meaning, more often there is only a single article, and the others are missing or unfortunately redirect, or are stubs at best. This isn't a guise for mass replacement because there isn't that much mass to replace. Remember that alternatives in only a few senses and British vs. American variants are excluded.

Now, the policy could result in a number of unfavorable replacements because it gives the community the mobility to act on its decisions, but that wouldn't be my dictate. Just as easily it could go in the other direction. What I'm trying to push are consistent guidelines. What those guidelines are is what I want to discuss here. I have stated my position based on an interpretation of the infiltration not only of the characters but their purpose. You have stated your position based on correctness that I'm sorry to note has been hotly contested by others. Regardless, I believe they should still be open to compromise.

Personally, I am willing to grant you equality of facade on the condition that you can convince others it is the preferred spelling in American English. That's already written into the proposal. An argument based on dictionary entries would be easy to make. I am willing to rewrite the defaults, leveling resumé and résumé based on the interpretation of the final accented é, in which case other diacritics "if necessary" is somewhat ambiguous. But it would still be left for a community decision. I am also willing to grant you full entries for both in cases where there is no default and no consensus, which isn't stated above but would seem like a natural extension. However, I'm holding very strongly to the idea of defaults because I think the guideline is sorely needed. I would like to see how others feel, but I am not yet convinced that the premise is wrong. DAVilla 09:47, 6 December 2007 (UTC)[reply]

As mentioned above, we are not talking about a great mass of entries here. I am in complete agreement that a general guideline / policy is urgently needed. I am also with Connel in his analysis that different spellings are divergent, not convergent, and so should have separate, non-syncronised entries which include links to the other spelling(s), and usage notes, or something similar, which identify where or why the forms are different. ( facade - façade is a good case in point ). I also believe it would be an effective point of departure to have a list of examples, as this might help us to identify a series of types and so have some archetypal forms to work from ( final -é, -ise & -ize, -o- & -ou- are some examples. ) and thereby reach something close to a consensus. At the end of the day, if someone in Taiwan (example chosen for the strong influence there of both US and UK English, as well as French) is trying to read a book by an English speaking author, he needs to be able to find the unknown word whether it is spelt with or without a diacritic, or an s-or-z, or whatever. Meanwhile, if this person is living in UK or US, (s)he also needs to know what is the preferred spelling in that country. Just my ha'porth - Algrif 14:41, 6 December 2007 (UTC)[reply]

Letter variations like -ise/ize, -yer/ier, -er/re would be nice to merge if it were possible to pick a primary entry. That doesn't seem hopeful, and your comments on US and UK English are another good reason not to attempt it. Other proposed work-arounds have failed or been left unproven. I'm focusing in places like word-final é where I believe we can make ground. DAVilla 09:09, 8 December 2007 (UTC)[reply]

Look- this is really simple. Davilla said "We do not and cannot have completely duplicated entries for every alternative spelling." I say we can; it's not that hard. We're not a print dictionary; it's very possible. If it seems daunting to anyone, put the terms on a special page where they can be watched and I'll watch them and make sure everything is as it should be. sewnmouthsecret 14:57, 6 December 2007 (UTC)[reply]

The cost of paper, printing, and distribution are not the only factors influencing the decision by other dictionaries to consolidate certain types of entries. Stated another way, "Wiki is not paper" does not mean "Building and maintaining a wiki is effortless". By reducing the amount of effort required to create or maintain alternative spelling entries, we will allow that saved effort to be spent on improving some other aspect of the project. If we could use templates to share content easily across entries, we could possibly afford not to consolidate alternative spellings, but given that we cannot yet do so, I support DAVilla's proposal. Rod (A. Smith) 16:48, 6 December 2007 (UTC)[reply]

I'm sorry Rod, but I find that view to be very short-sighted. Destroying entries that have only barely begun to differentiate each other is simply wrong. Case in point: flotation vs. floatation; the order-of-magnitude preferred spelling is currently a soft-link (i.e. bogus) entry. DAVilla's examples above are all flawed in that they are all hotly contested and are all incomplete. (But he further misrepresents what is at the various entries, entirely dismissing discussions regarding them.) I'm not sure why he's going on about facade; I am still trying to avoid that particular flame fest. The irony is that the "preferential spellings" debacle he links above is the direct motivation for his writing this - yet he concluded exactly opposite to previous consensus, eloquently skipping past every issue considered previously. Now, the various IRC syncbot discussions that I've seen make the basic assumption that expanded entries are encouraged - looking at automation-assisted synchronizing methods can't really happen, if you turn around and write a bot to do the exact opposite (as you suggest you might in your comments?) --Connel MacKenzie 17:58, 6 December 2007 (UTC)[reply]

Lacking access to such IRC discussions, I'm unfortunately not privy to them. Can they be posted on wiki? Rod (A. Smith) 18:15, 6 December 2007 (UTC)[reply]

To rectify Eclecticology's similar complaints, yes, http://carpathia.dereferenced.org/~featherdance/gfdllogs/ is (trying) to log such conversations from now on. There are several kinks being worked out still. Unfortunately, the previous conversations cannot be found nor released under the GFDL. Note too, that you should (as a sysop) have an IRC link to the right of your "log out" link that uses the Wikizine gateway to access the regular (not-yet logged) channel. (Note that all earlier discussions have operated under the assumption that public logging was prohibited - many comments there, especially when taken out of context, would be inappropriate. The new channel is explicitly GFDL to explicitly encourage reposting on-wiki.)

All that aside, do you agree that such (theoretic) automation should be pursued? Or do you maintain that it should be explicitly prohibited? --Connel MacKenzie 20:04, 6 December 2007 (UTC)[reply]

Without knowing the details, it's impossible to assess the impact that such automation will have on individual contribution, so I don't yet know where I stand regarding the automation. At face value, automation seems like a good idea, but it's important to begin with public discussions open to all members of this community. With enough eyes, all bugs are shallow. Rod (A. Smith) 20:11, 6 December 2007 (UTC)[reply]

Corrected (working) link above. --Connel MacKenzie 00:30, 7 December 2007 (UTC) http://carpathia.dereferenced.org/~featherdance/gfdllogs/ As you may recall, early discussions are often more productive with smaller groups when simply exploring possibilities. Several times now, http://wiktionarydev.leuksman.com/ has gone off in unexpected directions...to everyone's delight, in hindsight. --Connel MacKenzie 00:35, 7 December 2007 (UTC)[reply]

Well then, I object to any objections that are based on private conversations. DAVilla's proposal is in line with everything I know about, so I support it. Rod (A. Smith) 00:51, 7 December 2007 (UTC)[reply]

Your comment seems inexplicable to me - did you read what you wrote? The IRC conversations aren't the basis for the objection, they are an example of how the short-sighted policy proposal can only be a Bad Thing. --Connel MacKenzie 01:07, 7 December 2007 (UTC)[reply]

Connel, nobody wants to destroy useful entries, nor merge entries that require differentiation. Nobody thinks that an "order-of-magnitude preferred spelling" should be a soft-link entry. You make the anti-WT:AGF allegation that DAVilla is misrepresenting and dismissing relevant discussions. Then you call the proposal short-sighted because it would interfere with "automation-assisted synchronizing methods" discussed only in private "syncbot discussions". We cannot be expected to pave the way for efforts that we cannot review or participate in. Rod (A. Smith) 01:45, 7 December 2007 (UTC)[reply]

I've edited the proposal to put the characters ç and c on equal footing. That means there is no default preference between facade and façade and some reasonable determination would have to be made. Since there is already controversy, it would obviously be put to deliberation. Although the proposal suggests either without specifying which, I believe the community will decide both in this case. DAVilla 09:09, 8 December 2007 (UTC)[reply]

A test of this policy: facade v. façade

This is a mock vote as might be carried out in the Tea room or other designated place. The results here do not apply, but would indicate how the above proposal would classify these spelling variants on the assumption that one of facade or façade is the default, if either, as would be stated in policy. With consensus and waiving quorum, a few contribor's opinion would have the ability to override that default or to allow full entries for both articles.

The basic choices for this vote are:

1. Facade (only)

2. Façade (only)

3. Both

Alternatively, you may express an approval of combinations:

1+2. Either (whichever the community prefers, but only one full entry)

1+3. Facade or both

2+3. Façade or both

If facade were the default, a vote for facade or both would be equivalent to both since the default, which is dictated by consensus of the full community, would apply in the case of no consensus on these individual votes. Likewise, a default of façade would make façade or both equivalent to both. Keep this in mind when you state your preference.

Finally, you may abstain from or object to the mock vote. DAVilla 00:52, 7 December 2007 (UTC)[reply]

Votes

Façade or both. In this case façade deserves a full entry, in my opinion, regardless of whether it is the default, which I think it should be. However, from a survey of dictionaries it appears that facade is the prominent spelling in American English. Therefore it might be appropriate to have full articles for both. DAVilla 00:52, 7 December 2007 (UTC)[reply]
Both. I'm sticking with every spelling getting its own entry. sewnmouthsecret 01:27, 7 December 2007 (UTC)[reply]
Great, that's how I expected you to vote! DAVilla 01:32, 7 December 2007 (UTC)[reply]
Both. I don't think that either is preferential. They are both valid. They might even diverge in the future. - Algrif 17:07, 7 December 2007 (UTC)[reply]
Alternative spellings are not meant to imply one is more valid than the other. That implication, if it's real, is unfortunate.

Note that if they ever needed to diverge, that would automatically entitle full entries. DAVilla 09:21, 8 December 2007 (UTC)[reply]

Conclusions

If there were no defult, the outcome would be: BOTH
If facade were the default, the outcome would be: BOTH
If façade were the default, the outcome would be: BOTH

as of today. DAVilla 22:45, 14 December 2007 (UTC)[reply]

I know you've thought this through DAVilla, so why the back-pedaling to a particularly controversial example? Surely that doesn't represent the general case you (I guess) are trying to change policy to reflect? By the way, of the possible interpretations of "waiving quorum", which do you mean, anyhow? --Connel MacKenzie 01:23, 7 December 2007 (UTC)[reply]

I picked a controversial example because I don't want people to fear it. Whether or not this changes policy will be clearer when the results are in.

By waiving quorum, I mean that one person would have the ability to override the default if no one else gave a damn. It's akin to a whitelist nomination or a bot nomination, in contrast to a checkuser nomination. DAVilla 02:23, 7 December 2007 (UTC)[reply]

December is Adverb Month

Firstly, I am sorely disappointed with the appallingly low quality of our typical adverb entry.

Adverbs are, on the whole, not equitably represented as entries here. Typically, our adverb entries are relegated to a secondary, or even tertiary, status. They are often neglected in favor of their somewhat more "sexy" cousins: the nouns, verbs, and adjectives.

Perhaps you too have noticed that almost all of our adverb entries consist simply of a language header, a part of speech header, and a minimally written definition along the lines of "In an X manner." And that is all! While such definitions are quite common in print dictionaries, we need not limit ourselves to such tersely phrased entries. My personal opinion is that such definitions usually result from miserly editing intended to cut printing costs. Unfortunately, this pattern is now set in people's minds as an appropriate and acceptable definition format for an adverb and, sadly, most editors rarely or never expand beyond this unnecessarily brief definition line. Let me remind everyone that Wiktionary is not paper. We are not limited by exorbitant printing and shipping costs, and we can certainly do better.

I want greatly to encourage change and to see us thoroughly improve this deplorable state of affairs. Therefore, effective immediately,

I hereby declare December to be Wiktionary Adverb Month.

I am personally initiating three efforts to promote the plight of our needy adverb entries. These are:

Word of the Day, which this month will feature many more adverbs.
The December Adverb Challenge
The Christmas Competition 2007

Please participate in one or more of these efforts. Hopefully, our combined participation will result in a much improved collection of adverb entries as well as an overall elevation of quality on Wiktionary. --EncycloPetey 03:32, 5 December 2007 (UTC)[reply]

Mini-challenge: Count the number of adverbs used in the preceding text. Beware! this is harder than it sounds.

I counted nineteen, but I might have missed a few because after the first 6 or 7 I got bored and irritated and rushed through the rest a bit :D — [ ric | opiaterein ] — 04:23, 5 December 2007 (UTC) [reply]

COMMENTS: I'm fond of adverbs and have thought about them at great length and discussed them at the english usage newsgroups. The conclusion I've come to is, English is a preposition-heavy language. Adverbs play a lesser role than in other languages because they can very often be, and very often are, replaced with equivalent prepositional constructions. One rich source of adverbs, often ignored or mostly ignored by dictionaries, is the -ward suffix. At one time, I added what -ward adverbs I could think of that had b.g.c. support. With the CFI environment here I'm always careful to make sure everything I add has support, sure enough even one of the supported adverbs was heartlessly RFV'd (chairward) by someone who didn't care to check b.g.c. If people were willing to recognize -ward as a regular construction, we could virtually add a -ward word for every noun, and increase our adverb count by an order of magnitude :) Language Lover 04:24, 5 December 2007 (UTC)[reply]

Why aren’t all those adverbs that you added listed as derived terms at the entry for -ward? –Such sections are half the value of affix entries! † ﴾^(u):Raifʻhār ^(t):Doremítzwr ﴿ 14:58, 5 December 2007 (UTC)[reply]

Well, I think that simple definitions of adverbs are fine - i.e. in a <adjective> manner, or to an <adverb> extent tells you everything you need to know. You can look at the adjective for its own definition. Adding synonyms is fine, but they often turn out to be derived from synonyms of the adjective. Anyway, I'll do what I can.SemperBlotto 16:03, 5 December 2007 (UTC)[reply]

I just feel that someone looking for a definition of a word should be able to find the definition on the page for that word, without having to follow a link and assemble the definition themselves. Such minimal definitions also present real problems, particularly when the related adjective has more than one definition. In such cases, the user can't know which of the various senses apply to the adverb, and translations, quotations, etc. can't be tied to the correct sense, until the definition is properly completed and split. Consider the recent WOTD inevitably, which used to have as its definition "in an inevitable manner." There are actually two senses rolled into that single definition, so I had to split the defs and set the translations accumulated so far as "to be checked" before putting it in as WOTD. If a proper definition had been placed on the adverb entry from the start, then that later cleanup work wouldn't have been necessary. And take a look at the synonyms I added; at least one of them has no related adjective form, so having the synonyms in this case is a positive help. --EncycloPetey 16:59, 5 December 2007 (UTC)[reply]

Perhaps a really good example is (deprecated template usage) brightly, which is currently defined as "in a bright manner". With that definition, can you tell what "The sun shone brightly." means? Does it mean the same thing in the sentence "The chimes rang brightly in the afternoon."? Unless the sound of the chimes caused passersby to don their sunglasses, then probably not. How will we place translations on this page if both meanings are rammed together as the single definition "in a bright manner"? --EncycloPetey 03:55, 6 December 2007 (UTC)[reply]

Yes, that was my experience with rockily when I first tried out the Collaboration of the week. I hate that many thesauruses do not list adverbs. Not every adverb ends in -ly! DAVilla 02:45, 7 December 2007 (UTC)[reply]

December Adverb Challenge

As part of Wiktionary's Adverb Month, try the following challenge:

Log 120 adverb edits this month

For purposes of this challenge, an "edit" is any of the following:

Creating a new adverb entry (in any language)
The entry should be properly formatted, with Language header, POS header, inflection line (with category), and definition/translation.

English entries should have a proper definition, not simply "In an X manner."
Converting an existing entry into an entry that has all the above characteristics and an example sentence.
Adding to an existing entry any major section (e.g. Etymology, Pronunciation, Quotations, Synonyms, Translations).

The entries or edits do not have to be made to English entries.

Do take care when identifying the part of speech. Some prepositions and conjunctions look like adverbs, and some English adjectives end in -ly (and many English adverbs do not!). I'm you're not sure, ask in the Tea Room, since providing help with such matters is one of the functions of that discussion forum.

Meeting the challenge should not be difficult. Statistically, 120 edits can be done as six (6) edits per day over twenty (20) days. You are on your honor to keep track of the 120 edits, but please crow here if you complete the challenge. It may encourage more participation. --EncycloPetey 04:49, 5 December 2007 (UTC)[reply]

A lot of work needs to be done on making -like suffix words. Mostly these are adjectives, but a significant amount have subtle adverb senses as well. See doglike, squirrellike, birdlike, swanlike, ..... Language Lover 05:11, 5 December 2007 (UTC)[reply]

And the situation is even worse when you start to look at non-English languages we have. I'm aiming to create 120 new Latin adverb entries this month to meet the challenge. I started making a list of Latin adverbs, and was shocked at how many links were red. Even scarier was that the blue links were for words in other languages that simply happened to be spelled the same. We currently have a mere 144 Latin adverbs listed in Category:Latin adverbs. My shorter Latin dictionary has almost that many adverbs just beginning with "a". --EncycloPetey 00:50, 6 December 2007 (UTC)[reply]

I agree that a lot of (better) work needs to be done. One aspect of adverbs which we could perhaps direct more attention to, is that they are used to modify verbs and adjectives. In many cases the adverb is almost exclusively used for one or the other. I believe this aspect should be shown more clearly on the entries, as it would help in writing good definitions, and explain good usages. - Algrif 10:32, 7 December 2007 (UTC)[reply]

Hmm... that can get complicated. Pretty much any verb-modifying adverb also modifies participles, specifically those participles that function as adjectives. What would you do for predicable adverbs (The boat was adrfit.) where the adverb sits in the place of an adjective? There are also adverbs that can modify other parts of speech, including pronouns (Nearly everyone was there.) or prepositions (The post sat well beyond the property line.), However, (deprecated template usage) nearly can modify verbs and adjecitves as well. I suppose it might be best covered with a combination of example sentences and Usage notes, but that won't be a very clean solution. --EncycloPetey 14:47, 7 December 2007 (UTC)[reply]

You're right, of course, but if contribs are more aware of how adverbs work, then we can expect to get better entries than in an X manner. Simply thinking that surprisingly mostly modifies adjectives certainly helped me to improve it from In a surprising manner. to something which makes a little more sense. BTW, thanks for the prod. This initiative is long overdue. - Algrif 16:49, 7 December 2007 (UTC)[reply]

Christmas Competition 2007

This year's Christmas Competition is announced and is open to all contributors!

--EncycloPetey 03:49, 5 December 2007 (UTC)[reply]

Another duty for AF

Shall we have AutoFormat be programmed to add the wikilinking double brackets ([[ ]]) around lemmata in {{plural of}}, {{past of}}, and the various other templates? Since this is important for our statistics’ sake and is such a minor and uncontroversial detail whose application, AFAIK, is universal, it seems like the perfect job for our format-pedantic bot friend. † ﴾^(u):Raifʻhār ^(t):Doremítzwr ﴿ 03:43, 6 December 2007 (UTC)[reply]

Sounds feasible to me. (but what do I know?) --EncycloPetey 03:49, 6 December 2007 (UTC)[reply]

Please. DCDuring 04:42, 6 December 2007 (UTC)[reply]

What AF should be programmed to do is take the double brackets out (at least when there's another link on the page, since that doesn't affect statistics). DAVilla 08:22, 6 December 2007 (UTC)[reply]

I disagree. Retaining them encourages new editors to include them when they create pages themselves, even if they’re not necessary in every case. † ﴾^(u):Raifʻhār ^(t):Doremítzwr ﴿ 13:25, 6 December 2007 (UTC)[reply]

Note that there is active work on fixing the statistics counter, see bugzilla:11868. It would include the page in the count if it contains a template, such as {{plural of}}. We should wait and see what happens to that. Robert Ullmann 10:40, 6 December 2007 (UTC)[reply]

Shouldn’t {{misspelling of}} be excepted from the counted transclusions, seeing as they’re not meant to be counted in page statistics (and hence don’t get wikilinking double brackets)? † ﴾^(u):Raifʻhār ^(t):Doremítzwr ﴿ 13:25, 6 December 2007 (UTC)[reply]

Why not include common misspellings in our statistics? Rod (A. Smith) 16:31, 6 December 2007 (UTC)[reply]

At a primitive level, so that people don't claim that we suddenly include misspellings (non-words) in our comparative statistics. A dictionary of common misspellings might wish to see how many entries are linked from {{misspelling of}}, but all other comparison will wish to see the actual entries we have. Skewing those numbers (to include misspelling entries) would be a bad idea, as it would open up a new avenue of criticism. --Connel MacKenzie 19:27, 6 December 2007 (UTC)[reply]

I don't care where some as yet to be identified complainants note that we include misspellings in "our comparative statistics" (whatever those are). Keep in mind that including misspelling entries in article count has no bearing on our count of "English" entries. Is there a valid reason to exclude misspelling entries in our article count? Rod (A. Smith) 20:02, 6 December 2007 (UTC)[reply]

That (claiming there is no criticism) is a pretty irrational position, Rod. Our sister project Wikipedia, at their entry for w:Wiktionary itself contains plenty of criticism already. But no, other dictionaries do not typically count form-of entries, nor misspelling indicators. The "count" controversy was significant around the original AHD. While it may be dubious to tweak form-of entries to be included in the main count (primarily because of newcomer contributions, especially when nearing a milestone) there is just no rational excuse for going out of our collective way to skew statistics to include things that are (elsewhere, such as Wikipedia) normally insignificant redirects. Leaving soft-redirects was a compromise to begin with...they used to be deleted very quickly and consistently, here. --Connel MacKenzie 05:34, 13 December 2007 (UTC)[reply]

Why do you think I claimed that we lack critics? I said that I don't care whether some unnamed group complains about a statistic as meaningless as our "article count". I would understand concerns about possible inflation of a claim about the number of "full length entries about standard English words" or some such, but that's not what MediaWiki article count needs to mean for us. Rod (A. Smith) 09:03, 13 December 2007 (UTC)[reply]

Would we (or someone) be able to designated which templates are included in the count? If not, then I think we should proceed actively with a solution of our own. --EncycloPetey 14:45, 6 December 2007 (UTC)[reply]

From the discussion on bugzilla, I would say not. At present the count is all NS:0 pages which are not redirects, and contain the string [[. Even in an HTML comment! The idea would be to count {{ as well. In any case, I don't think there is any NS:0 page (other than redirects) that we don't want to count? (The number of things tagged for delete or whatever should be in the noise.) Yes, one might exclude misspellings, but at the cost of all that overhead on the counting? (again, see the bug for more information). Some don't-count-this marker might be useful. Robert Ullmann 16:45, 6 December 2007 (UTC)[reply]

A don't-count-this-entry marker would indeed, be cool. I don't think any dev's consider that particular bug to be worthy of their attention. Even if they are convinced, we'd still have to have an official vote later, to have it turned on. --Connel MacKenzie 19:27, 6 December 2007 (UTC)[reply]

I like the marker idea, as there are too many nuances otherwise. But if you want a truer comparison, who's claiming that other Wiktionaries would be as vigilant about using it? I think it makes more sense to just calculate these things ourselves. Put Categry:English non-entry in {{misspelling of}} etc. and then just subtract the size of that category. Or count it the other way, putting Category:English entry in all verified entries. Speaking of which, correct category size information is a more critical bug. DAVilla 22:48, 6 December 2007 (UTC)[reply]

I've thought of trying to do something with the dead end pages; most haven't been touched in ages, and need attention, not just a wikilink of two in the defs. Robert Ullmann 16:45, 6 December 2007 (UTC)[reply]

I have noted verb lemma pages where an inflected form is blue, but links only a redirect page pointing back to the verb's lemma page. Perhaps it happens with plurals, too. Can these pages be either converted en masse to real pages or at least identified and made available to cleanup. Why should the French have more pages than us? (I know it's probably because they cheat with all those inflected forms.) OTOH, there may not be very many of these. DCDuring 17:04, 6 December 2007 (UTC)[reply]

It does indeed happen with plurals. Such was the case with ceilidhean before I fixed it. † ﴾^(u):Raifʻhār ^(t):Doremítzwr ﴿ 21:22, 6 December 2007 (UTC)[reply]

Hey, that's great! Keep in mind that eventually (in theory, at least) all Wiktionaries will have approximately the same number of entries. Just because Wiktionnaire is better at importing external dictionaries automatically, doesn't mean they aren't making good progress. (I'm a little concerned, remembering that fr.wikiquote was shut down [July 2006] for copyright violations, but I don't think the same mistakes were made on Wiktionnaire.) Our French and German coverage is still sparse, so those will be logical places for us to work on in the coming year. Having reference entries at Wiktionnaire can only help. --Connel MacKenzie 19:50, 6 December 2007 (UTC)[reply]

Links to Wikipedia redirects

petrol filling station has a {{wikipedia}} template that links to a redirect. The actual page is w:Filling station. Is this OK? SemperBlotto 11:27, 6 December 2007 (UTC)[reply]

I don't see any problem with it. It might be improved to {{wikipedia|Filling station}}, but it doesn't matter that much. And we can't reasonably do that all the time; WP may reverse/move the redirects. Better in the general case to link there with the pagename, then follow whatever redirect they have set. Robert Ullmann 11:45, 6 December 2007 (UTC)[reply]

I agree with Robert Ullmann. I just wish to clarify that one reason it's better to link to e WP redirect than to link to the redirection target is that if they reverse the direction of redirection then "see WP's filling station", linking (in the end) to WP's petrol filling station article, in our petrol filling station entry, looks a bit silly.—msh210℠ 17:34, 6 December 2007 (UTC)[reply]

I prefer to add every relevant Wikipedia link, in which case the existing title is the best way to distiguish them. Usually that's just a paranthetical, but occassionally it varies widely as does this one. In this case it doesn't seem to make a lot of difference, but in general I think the existing titles are preferred. msh210's comment could at most only apply to one link for any given entry, the redirect of the same title. DAVilla 18:26, 7 December 2007 (UTC)[reply]

attributive forms of nouns

The attributive form of a noun serves as an adjective-like word, modifying other nouns, as computer in the computer table. When a noun has more than one word, its attributive-use form is usually (or at least sometimes) written with hyphenated, as attributive-use form, though some people write it without a hyphen. Since people are likely to come across such hyphenated nouns and wonder what they mean, I started adding them, listing them as adjectives (which I thought, and to some extent think, they are). See e.g., kiwi-fruit or prisoner-of-war, and, for a list, User:Msh210/English nouns with spaces. Ruakh stopped me from adding them as adjectives, saying they're nouns: User talk:Ruakh/Archive 2007#pudding-basin. Is he correct? If so, I suppose we should add those hyphenated forms (the attestable ones, say) as nouns (with {{form of|Attributive form|foo}})?—msh210℠ 22:45, 6 December 2007 (UTC)[reply]

I don't see the value in adding an adjective section for the attributive use of every English noun. That's more a property of the English language than it is a property of any individual word. Of course, I won't be surprised if the Wiktionary faction that supports content proliferation and duplication disagrees with me on this point. Some confusion may stem from the fact that most of our POS headers describe the part of speech of the word named by entry, but in entries for phrases, the POS headers usually describe the common functions of those phrases. Rod (A. Smith) 23:50, 6 December 2007 (UTC)[reply]

The reason we might want to have these is that some compound terms are normally hyphenated in the attributive form and some are often not. The cases that are not should probably be handled as a usage note. Would that be enough? DAVilla 02:37, 7 December 2007 (UTC)[reply]

I'd have to give this some thought, but let me address something more basic first. I presume everyone agrees we should have entries for upper-case and lower-case. Would those be considered adjectives or nouns? DAVilla 01:07, 7 December 2007 (UTC)[reply]

I'll bite. I normatively think of hyphenated collocations like that as adjectives and the non-hyphenated versions as SoP (in context). Empirically, the hyphenated collocation is used both as adjective and as noun, both in numbers too high to ignore. ~~Are you sure you want to use a hyphenated collocation for an example?~~ DCDuring 01:20, 7 December 2007 (UTC)[reply]

I would think they're adjectives too, synonymous with uppercase and lowercase. On the other hand, "tea-cup motif" shouldn't be both an adjective and a noun. DAVilla 02:37, 7 December 2007 (UTC)[reply]

My experience with atributive nouns generally follows the pattern msh210 describes. This could be one more thing to be handled under DAVilla's proposal for dealing with spelling variants, if that proposal leads to a change in our practices. --EncycloPetey 01:31, 7 December 2007 (UTC)[reply]

You could use almost any noun as an attributive in English, couldn't you? I don't see any real need to include them as adjectives, but maybe I'm not understanding what's going on fully. Wouldn't be the first time :D — [ ric | opiaterein ] — 02:40, 7 December 2007 (UTC)[reply]

You seem to be understanding the same thing I am, so if you're confused then we both must be ;) --EncycloPetey 02:50, 7 December 2007 (UTC)[reply]

As long as I'm not the only one :)

So anyway, I've never actually heard of an attributive case...not even in Finnish, which has an assload of cases. Is it something official in English? — [ ric | opiaterein ] — 02:57, 7 December 2007 (UTC)[reply]

It's not a case, but a function. Any noun placed to function like an adjective is exhibiting "attributive use". You may have seem that terminology thrown around in RfD. So, "computer code" uses (deprecated template usage) computer attributively; "Christmas present" uses (deprecated template usage) Christmas attributively; and "CD artwork" uses (deprecated template usage) CD attributively. I'm not sure whether this shows up in any on the inflected languages, because they'd usually use the genitive for that, or a prepositional phrase. English, lacking most inflections, makes do with word placement instead. --EncycloPetey 03:02, 7 December 2007 (UTC)[reply]

Imagine a book on a topic. There, you now have a potential attributive function to every noun in the English language. I read a lamp book, then I read an electrolysis book, then I read a tissue book, and a jogging book, and a plate book. There is no limitation. Cheers! bd2412 T 03:16, 7 December 2007 (UTC)[reply]

I want to borrow your loquacity book. :) --EncycloPetey 04:36, 7 December 2007 (UTC)[reply]

Just to clarify, I never meant to add senses to house or prisoner of war; I was referring only to those words that do not exist as nouns except in the attributive sense, as prisoner-of-war.—msh210℠ 11:48, 10 December 2007 (UTC)[reply]

While it would appear so, I would argue that these nouns are not functioning as adjectives. Computer in my computer table is, and always was a noun, a thing. It's just a compound noun phrase, which doesn't change the concrete thing, a computer, into a descriptive concept. Consider adding a real adjective to the phrase -- my blue computer table. Computer doesn't function the same way as blue (my blue, computer, table?). You would read the thing as blue modifying computer table, a compound noun. In German, they would make a whole new long word out of it: mein blue computertable.
Clearly, we need not make mention of common attributive use (the function of English) -- eg. bottle opener, business lunch -- but we should have entries where the attributive use is widespread and has a life of its own. The only problem is, should they continue to be labeled Adjective? -- Thisis0 18:27, 11 December 2007 (UTC)[reply]

I would say not. As you point out, attributive use is not really the same as an adjective function; there are important grammatical differences. --EncycloPetey 05:29, 13 December 2007 (UTC)[reply]

Fine, so we most of us seem to agree that these are nouns, not adjectives. Therefore, I suppose, the ones that are listed as nouns anyway shouldn't have separate senses with definitions like "Attributive form of...". But a bunch — like prisoner-of-war — are nouns with spaces, and the hyphenated form is only an attributive form. The question remains: Should these have separate entries? I think so, as people will look them up. What do you all think? (It seems that most of us agree that, if they do have entries, those entries should list them as Nouns.)—msh210℠ 19:39, 17 December 2007 (UTC)[reply]

This is part of what DAVilla was trying to do (above) with her proposal in how to decide when to have separate entries and when to use redirects. At this point, I think the two have become intertwined. --EncycloPetey 02:17, 18 December 2007 (UTC)[reply]

Homophones at lv4 header under lv3 pronunciation

Check out sonnets#French. What do we think of this formatting? — [ ric | opiaterein ] — 03:40, 7 December 2007 (UTC)[reply]

Such formatting would usually be overkill for English, but I imagine might be useful in a language like French. Most English words have at most one homophone. I do like having Homophones as a L4 header, but have doubts about the general utility of the collapsible box. --EncycloPetey 04:32, 7 December 2007 (UTC)[reply]

I really just used the box to avoid a long list before the definition. It looked strange that way.

But I do agree on the English point. It might just be better for English to do something like...

===Pronunciation===
* {{IPA|/whatev/|lang=en}}
* Homophones: x

Come to think of it, I've never seen homophones listed in an English entry. — [ ric | opiaterein ] — 04:37, 7 December 2007 (UTC)[reply]

They're all over (see appellation). I prefer a separate header for them, though, just as we do for other lists of terms. Part of my rationale is that it puts "Homophones" into the TOC, what alerts users to the fact that a homophone exists. Placing them as a single line tends to get them lost among clutter as the pronunciation section fills up. --EncycloPetey 05:38, 7 December 2007 (UTC)[reply]

Is my revision any better? Or is it just ugly? † ﴾^(u):Raifʻhār ^(t):Doremítzwr ﴿ 13:10, 7 December 2007 (UTC)[reply]

I'm thinkin' it looks fine right now. Keep in mind I'm half awake, and the main thing I like about it is that it isn't just a straight-down list ^_^ French does tend to have a lot of homophones, so I'd like to have at least one or two 'standard' ways of organizing homophones.... — [ ric | opiaterein ] — 14:00, 7 December 2007 (UTC)[reply]

I agree one-per-line is probably better than all on one line. The addition of HTML tables was problematic enough for inflection lines - (collisions with images, Wikipedia boxes, etc) which invariable render poorly on smaller displays. So User:Opiaterein's initial layout, in that regard is definitely "better." Switching to two or three columns when there are a lot seems acceptable. But five only works if the words are really small example words. For homophones, would it be reasonable to add a gloss for each, (especially when red-linked) or just superfluous? --Connel MacKenzie 17:52, 7 December 2007 (UTC)[reply]

I think a gloss, while not superfluous, is irrelevant to the section where it appears. There are already some notes about dialectical differences that sometimes must be included following a homophone (see or), so adding more information would become visual clutter. I'd rather keep the pronunciation section limited to just pronunciation issues. --EncycloPetey 01:05, 8 December 2007 (UTC)[reply]

Talk:喇叭#Mandarin definition

I believe that verifiability is the key to Wiktionary's survival. In that spirit, here is a link to a discussion that may be of interest. It is a back and forth about a definition of a word between a sysadmin (me) and an anonymous contributor who claims to be an expert in a certain field (who doesn't :). Sound familiar? -- A-cai 00:27, 8 December 2007 (UTC)[reply]

Explanation of deletion

The text "explanation of deletion" is very vague and it makes it difficult for users to enter legitimate terms that have been deleted for unknown reasons. It is not required that three citations be included when recreating all entries, only for entries that have failed RFV within the past year. Therefore I would like to semiformally request that all deletions where re-entry is prohibited indicate the criterion that the entry failed to meet, particularly attestation or idiomaticity, which is implied by "not cited" or "not idiomatic", respectively. Even something as simple as "RFV" or "SOP" would be an improvement, although a claim like "seems to have different meaning" would be a very helpful summary of discussion. "Tosh" etc. are also acceptable since this request is not meant to override sysop discretion. However, I would like to stress that in those cases it is not necessary to cite the term completely when recreating. Questionable entries only require a convincing reference—I would imagine an entry in a printed slang dictionary or a single quotation of use—and the term can be put to RFV if still doubted. DAVilla 05:11, 8 December 2007 (UTC)[reply]

I am worried that deletion comments are not always correct. See Wiktionary:Project-Recreate-deleted-warn. DAVilla 05:55, 8 December 2007 (UTC)[reply]

Dictionary in peril

The Chinese dictionary mentioned in this article is one of the most comprehensive of any dictionary I have ever come across, on-line or off-line. The day after the article came out, the website went offline, and is still not back up as of this writing. I find it sad, because I disagree with the portrayal of the dictionary as abounding with errors. In fact, it is an invaluable reference, especially when translating older works. Here is the link to the dictionary. Right now, it says "undergoing maintenance, sorry for the inconvenience." I hope they're not going through every single entry. At 160,000+ Chinese words and phrases, it may take them a while. -- A-cai 14:21, 8 December 2007 (UTC)[reply]

If only we were so ablush at two errors in our entire project. DAVilla 18:21, 8 December 2007 (UTC)[reply]

The Guoyu Cidian online dictionary is finally back online :) Here is the new website:

http://140.111.34.46/newDict/dict/index.html. -- A-cai 07:40, 24 December 2007 (UTC)[reply]

nonce words & coinages

All good writers are free to come up with new words of their own. While some made-up words are plain invention and clearly don't merit a page here (unless in notable well-known works as provided for in CFI), others are legitimate formations whose meanings are fairly comprehensible. I have been solving this by recording them under their relevant component parts – e.g. in Against the Day which I'm currently reading, Pynchon uses such words as arnophilia and ovoõleaginous, which I have used to provide citations for arno- and ovo- respectively. My plan is that if we cite logical coinages in this way, we can then see if more than a couple of the same word turn up, in which event it would "graduate" to page of its own. Does anyone understand what I'm going on about and have any feelings on the subject? Widsith 15:01, 8 December 2007 (UTC)[reply]

Do you mean words like zillionaire, and squillionaire, etc, which do not really merit an entry (or do they?), but the basic formation should be noted for the day when they might become more common? Am I anywhere close to what you have in mind? - Algrif 15:48, 8 December 2007 (UTC)[reply]

I think those examples are well-attested. I am talking more about words for which have only one citation, but which seem to make sense as "valid" words. Widsith 15:55, 8 December 2007 (UTC)[reply]

Yeah, I think so. I've done a couple of those, e.g. at -dom. DAVilla 17:44, 8 December 2007 (UTC)[reply]

Never mind, now cited, if that's what you'd call it. Three quotations spanning not even a week, and one quotation more than a century earlier. DAVilla 18:14, 8 December 2007 (UTC)[reply]

Shouldn’t those two citations be added to Citations:arnophilia and Citations:ovoõleaginous, respectively, rather than at their component affixes’ entries? (And are you sure he used ovoõleaginous, and not ovoöleaginous, or something? –The former is certainly not a “legitimate formation” in English morphology.) An affix is cited by having three terms which feature it listed in its entry, and those three terms, it is only reasonable to argue, must themselves satisfy the CFI. Following that line of reasoning, including citations for non-CFI-satisfying terms in the entries for affixes seems like a bad idea — especially for highly productive affixes like non-, and for affixes frequently used to coin nonces, such as -philia and -phobia. † ﴾^(u):Raifʻhār ^(t):Doremítzwr ﴿ 02:37, 9 December 2007 (UTC)[reply]

Oops, you're right I meant of course ovoöleaginous. Well, I considered the words as being citations for use of the affixes, taking the view that affixes by their nature are often used to form nonce-words. I understand what you're saying though. Perhaps they should go on citations pages as well, although the chance of any other examples turning up seems rather tiny. Widsith 10:02, 9 December 2007 (UTC)[reply]

But that's one of the reasons DAVilla pushed for the Citations: namespace (and one of the reasons I like the idea). That we, we can accumulate citations for rare and unusual words without having to have a full entry first. The OED does this as well, of course. They don't throw away citations for one-off words, but keep a record of them in case they should turn out to warrant an entry. --EncycloPetey 16:12, 9 December 2007 (UTC)[reply]

Yeah I get that, but many of these words are never likely to be used again and it seems a shame to dismiss them entirely for all that.. Widsith 17:21, 9 December 2007 (UTC)[reply]

This is similar to the issue under #Harry potter above, and as I said there, these words should exist in the main namespace, as there is no way for readers who come across them to determine whether or not they are fictional (or not before they look them up in their favourite online dictionary). Or perhaps a better solution would be to put a redirect (or soft redirect) to a glossary of terms in that book/series (Concordance?). In terms of using them as citations, if they do not get an entry in the main namespace, they shouldn't be used as an example - as it seems to be the consensus that they don't count as words. Conrad.Irwin 21:44, 9 December 2007 (UTC)[reply]

Though I disagree, I understand your last argument for terms that don't meet CFI, as in this case. Constructed example sentences should likewise be grammatically and otherwise correct. However, that conclusion would be crossing a line if an unattested word happened to appear in a citation for another term. We should not censor quotations because we believe them to be invalid any more than we should censor metaphoric usage.

As for the proper space for these definitions, I'm all for stripping out protologisms in our list that don't have any backing at all. Not just neologisms, but even every protologism should have a Citations page, in my view. DAVilla 00:21, 10 December 2007 (UTC)[reply]

I support DAVilla’s proposal to remove all terms from WT:LOP which lack even a single quotation of use. † ﴾^(u):Raifʻhār ^(t):Doremítzwr ﴿ 03:02, 10 December 2007 (UTC)[reply]

Perhaps I've missed the point of the LOP, but I thought it was (in part) for non-words that the adding editor thinks ought to be a word. Many of them, of course, are therefore humorous concoctions, but it seems to me this serves the function of a safety valve, to keep people from taking inappropriate steps to get their word included in the mainspace. If not LOP, then we should have some other space set aside for people to put down their 'ought-to-be-a-word' creations, so long as the proposals themselves are not defamatory or purely idiotic. bd2412 T 03:53, 10 December 2007 (UTC)[reply]

I think a great example which seems to be what you're talking about, are the -like words. Take practically any English noun and append -like and you get an adjective (sometimes an adverb too). An example is windshieldlike. It has 1 b.g.c. hit, 0 g.g.c. hits, and 2 legitimate google hits. Should we add it? What about podiumward, which has 1 legit google hit and not much else? These examples have very low attestation, but when they occur, we immediately know what they mean and we'd be insane to complain the author is making them up. So they are a very difficult area for CFI purposes. Language Lover 04:34, 13 December 2007 (UTC)[reply]

In that regard, it's incredible how much derivatives can get a "it doesn't exist" treatment from dictionary. An amazing number of French adverbs in -ment get this. Circeus 05:24, 13 December 2007 (UTC)[reply]

LanguageLover, the difference in those cases is that the author is taking a well-established, standard suffix and attaching it to a well-known noun by means of well-established rules for using the suffix. The same cannot be said of the majority of items added to the LOP. --EncycloPetey 05:27, 13 December 2007 (UTC)[reply]

Yes. It is exactly these well-establish logical nonces that I was talking about - not the LOP stuff. Widsith 11:39, 13 December 2007 (UTC)[reply]

IATA and ICAO airport codes

Wikipedia has a listing of something close to 5,000 three-letter International Air Transport Association airport codes (see List of airports by IATA code: A to [w:List of airports by IATA code: Z|List of airports by IATA code: Z]]), as well as a similar number of four-letter International Civil Aviation Organization airport codes (see List of airports by ICAO code: A to List of airports by ICAO code: Z). Some of these, particularly among the IATA codes, are quite well known (MIA and LAX spring to mind). I propose that we include all of these in our dictionary. To make this inclusion easier, I propose that we hire a bot to sift these codes out of those Wikipedia pages and create wikiquote articles on them (along the lines of LAX. Does anyone think this is a bad idea? bd2412 T 07:42, 9 December 2007 (UTC)[reply]

I think it's a good idea as long as the bot can handle adding the codes to the correct capitalization, and can deal with pre-existing entries. --EncycloPetey 07:44, 9 December 2007 (UTC)[reply]

Bearing in mind that I lack bot-fu, and would be counting on someone else to invent the right code. It seems to me though that the stuff on the 'pedia pages should be easy enough to sort out and format correctly. As for existing entries, I would guess that if they are already initialisms, said bot would be taught to just add this as another def, and if they are (for some odd reason) not, then the bot would have to add that L3 header. bd2412 T 08:16, 9 December 2007 (UTC)[reply]

I don't think that would be helpful. I wouldn't mind a handful of well-known ones, but we're a bit too lenient on acronyms in my opinion. They should be citable in running text as with any other term, and additionally I feel that they should not be explained in the text. There are many unheard and unknown organizations, for instance, that would qualify otherwise. This parallels our policy on stock symbols. DAVilla 17:18, 9 December 2007 (UTC)[reply]

For every three letter combination for which we have an entry, shouldn't we have the IATA code as an alternative definition? bd2412 T 20:21, 9 December 2007 (UTC)[reply]

At least in that case of apple pie there is a figurative use that pertains to the sense in question. At least in the case of Moby-Dick the well-known character is from the same work. I'm not a fan of the space docking theory that an existing page opens the floodgates. I believe that some but not all company names, some but not all book titles, some but not all airport codes have entered the English language. While it is difficult to convince others, at the same time I further believe that this is independent of the other senses, per DuPont, Crime and Punishment, and LAX. Analysis in this view is what landed our approved but still infantile brand names criteria. DAVilla 00:29, 10 December 2007 (UTC)[reply]

I thought acronyms/initialisms were rather in a class by themselves. It should be fairly easy to find citations for the major airports, at least, but that kind of misses the point of letting a bot do the work of plucking them from 'pedia and plugging them in here. bd2412 T 02:58, 10 December 2007 (UTC)[reply]

As it stands they are, which is a mess. All of this is just my opinion though. Others may consider if we want to have these, and maybe a much greater number of codes, like stock symbols etc. Scientific names are prescriptive, so we aren't as purely descriptive as we claim to be. Maybe we could have another namespace for specifically that type of information? These international codes are rather unique in that they don't pertain to specific languages. They're all "translingual". DAVilla 07:23, 10 December 2007 (UTC)[reply]

I want to underscore, also, that the IATA codes are not brand names - they are an internationally agreed-upon set of references for specific airports, not designed to promote any particular airport. It just seems to me that this is the sort of thing a dictionary should have. I don't see a problem with all of them being translingual as well. bd2412 T 17:14, 10 December 2007 (UTC)[reply]

(clear indent) There is no reason that wiktionary should not include these, along with all other abbreviations. They definitely should not be in a seperate namespace as anything that goes down that route seems doomed to stagnation (except for Appendix:Names?). I see no problem with having scientific terms and molecule names included, the more information we have, the better a resource we will be. Conrad.Irwin 14:09, 12 December 2007 (UTC)[reply]

It's a little abstract, but the reason other namespaces are "doomed to stagnation" is that they aren't integrated as well as the regular dictionary entries, such as with searches. DAVilla 19:46, 12 December 2007 (UTC)[reply]

I actually like working in the Appendix space, but these should also be regular entries (and I see no need to duplicate Wikipedia' existing articles in our appendix space, unless we are going to offer some additional functionality, such as links to the individual terms in our mainspace). I would like to add that probably the majority of these TLA's have some additional meaning which also merits a definition. Cheers! bd2412 T 22:20, 12 December 2007 (UTC)[reply]

Well, I might have to tone down "doomed to stagnation", but I still support putting these in regular entries. interesting sideline: The reason that the search feature only searches the Main namespace is because that is where the content should be, we shouldn't try to fix it the other way around. I think that if words are listed in the Appendices/Wherever then they must be manually linked to that page from their main namespace entry (even if it wouldn't otherwise exist), I can't be the only person who looks things up using the address bar :). Conrad.Irwin 23:32, 12 December 2007 (UTC)[reply]

Further to the above, would anyone object if I copy the 26 Wikipedia pages listing these airport codes into our Appendix space, and modify them to suit the purposes of a dictionary instead of an encyclopedia? bd2412 T 21:48, 23 December 2007 (UTC)[reply]

problem with the Hanzi header

I have been skeptical about the Hanzi header for quite some time. I believe that there instances where its format serves to confuse more than to help. The Mandarin section for 卜 is one such example. It says that the traditional character is 蔔 and that it is pronounced bǔ. The problem is that 蔔 is only the traditional for 卜 when used in compound words such as 蘿蔔／萝卜. Furthermore, when used in such compounds it is never bǔ, but rather bó or bo. In fact, 蔔 is never bǔ. When 卜 is a verb it is bǔ, and it means to divine. I have added the proper POS sections, which hopefully explains the above. In my view, once these POS sections have been added, there is no longer a need for the Hanzi section. However, Robert Ullmann feels strongly that the Hanzi header should stay. He likens it to other headers such as Symbol etc. If we do keep them, there are going to be many instances, such as this entry, which will serve to confuse our readers (see Talk:卜). Is there a way to satisfy Robert's software/database needs without confusing our readers? Robert, feel free to respond; however, I'm hoping to hear from others as well. -- A-cai 08:04, 9 December 2007 (UTC)[reply]

Most single characters can function as words themselves... so I don't see a need for a Hanzi header. — [ ric | opiaterein ] — 16:19, 9 December 2007 (UTC)[reply]

These entries are both for the character and for the word(s). If you think about the various kinds of (print) dictionaries, our entries serve both as the character entry, with a list of compounds, and as the word entry, for words that are a single character.

The traditional or simplified reference was only added to {{cmn-hanzi}} to deflect an annoying editor who kept trying to put "Style: Simplified" back into entries. I'd be very happy to lose it except that in the absence of a POS header it won't appear anywhere in the text (maybe in the zh-forms box). Even with the POS header(s), it becomes difficult for something reading the data to reliably find it.

In this entry, bó and bo should be in the Hanzi section and line, but they were not in the Unihan DB when Nanshu converted it. (And are not now.) The traditional reference would make more sense if they were there. But you might also remove tra=蔔 from that template. The explanation you give above would make an excellent Usage note under Hanzi, so the reader doesn't have to divine the correct or used combinations.

When you remove the Hanzi section, you are removing the visible (and linked) pinyin-with-tone-number, as well as the Wade-Giles and Yale transliterations. As above, they could be put elsewhere if there was a POS, but then are much harder to find.

The vast majority of entries don't have POS (or defs), having most of them with this information in one consistent place, but a few having it missing or moved becuase you have removed the Hanzi section is a serious problem. (And editing them back in is a pain; partly because of this, the exception list for the Han entries is growing rather than shrinking.)

Unless you have another place for this information that is consistent across all 22,309 entries, it is essential that this section stay. Robert Ullmann 16:45, 9 December 2007 (UTC)[reply]

I should have re-written the about page for these entries a long time ago, very much overdue; and explained all the logic behind how they were set up. Robert Ullmann 17:03, 9 December 2007 (UTC)[reply]

Is there anything wrong with the following?

卜 (traditional 蔔 or 卜, pinyin bǔ (bu3), bo (bu10), or bó (bu2), Wade-Giles pu3, pu10, or pu2)

DAVilla 17:10, 9 December 2007 (UTC)[reply]

A-cai explained what was wrong with it in the first post in this section. — [ ric | opiaterein ] — 20:44, 9 December 2007 (UTC)[reply]

If so I don't understand the argument. I think what was overlooked is that the same 卜 is also the traditional character for 卜 in the cases where 蔔 is not. DAVilla 00:37, 10 December 2007 (UTC)[reply]

(This is mostly in response to Robert) Having an entry for 蔔 (and every other hanzi character) isn't the same as having an entry for "t" or "ş" or other letters. Most single letters can't function as single words, whereas most hanzi can (at least in Mandarin.) There's already the translingual section that goes over general meanings. "蔔 (simplified 卜, pinyin bó (bo2), bo (bo5), Wade-Giles po2, po5)" is really superfluous. I've never-ever-ever seen anything other than hanyu pinyin used anywhere, so the Wade-Giles transilerations to me seem utterly useless. The numbered transilerations (bo2, bo5) aren't really necessary, either, as anyone who knows the tones (which you really should if you study Chinese) will know which marks correspond to which tones. — [ ric | opiaterein ] — 02:47, 10 December 2007 (UTC)[reply]

1991, Charles A. Desnoyers, Self-Strengthening in the New World, in Pacific Historical Review 60(2), pp195-219.

[...] led by Ch'en Lan-Pin and Jung Hung, the latter more widely known as Yung Wing.

OK, now you've seen Wade-Giles used somewhere. Take a look around sometime. Cynewulf 03:02, 10 December 2007 (UTC)[reply]

Let me rephrase, then. I've never seen Wade-Giles used in any guide to Mandarin, be it a dictionary, a coursebook, travel guide, whatever. I have a good handful of these books, and I don't remember any of them even mentioning any system other than HP. So the use of WadeGiles in an English book that isn't about the Chinese language doesn't make me feel any differently, especially being that it was written and published more than 15 years ago? :p If I remember correctly, even the Chinese government uses hanyu pinyin as the official system of transileration. — [ ric | opiaterein ] — 12:55, 10 December 2007 (UTC)[reply]

Which is where it gets political. No, I'm not going to support any measure that pushes only one world view, no matter how big China is.

I've never seen any guide to English, be it a bilingual dictionary, coursebook, or travel guide, that includes the better half of stuff we have here. DAVilla 15:22, 10 December 2007 (UTC)[reply]

We can't just decide that history starts ten years ago. Cynewulf 15:41, 10 December 2007 (UTC)[reply]

My point was the WG is outdated, and not likely to be incredibly useful. — [ ric | opiaterein ] — 16:24, 10 December 2007 (UTC)[reply]

Written text more that 15 years old is outdated, therefore not "incredibly useful"? I see. *sigh*. Bye-bye Chaucer, Pepys, Phillip Marlowe, The Iliad, and most of the old Chinese characters and literature anyway. (People still read Shakespeare? OK, we'll keep that.) Robert Ullmann 23:31, 10 December 2007 (UTC)[reply]

That's so not what I said, and those so aren't even comparable :p I was talking about a system of transilerating a foreign language, not works of fiction. :p — [ ric | opiaterein ] — 00:59, 11 December 2007 (UTC)[reply]

The problem is, there are many works out there which have, in the past, been transliterated in WG. Someone may come across one of those works and turn to Wiktionary for the definition of a transliteration therein. And hopefully we'll have the answer! bd2412 T 22:14, 12 December 2007 (UTC)[reply]

I think what opiaterein was trying to say is that Hanyu Pinyin is the standard Romanization for teaching Mandarin. Wade-Giles is rarely used in classroom instruction anymore, even though it is quite common in older English writings about China. My vote would be to use Pinyin for Wiktionary entries. I don't see a pressing need for the other Romanizations to be listed as well. I have already created a conversion chart on Wikipedia at w:Template:Pinyintable (the template appears in the Wikipedia w:Pinyin article etc). -- A-cai 08:54, 10 December 2007 (UTC)[reply]

Whether there is a "pressing need" is not the issue; WG is useful content, already present in a large number of entries. What you are doing in removing the Hanzi section is removing content, which is never acceptable. There is a lot of text out there in WG; google Pei-ching for example, and consider that the WG at 北 and 京 is very informative, explaining why the Pei-ching spelling exists (both in English texts and in WG transcriptions). And the "anymore" part isn't really relevant; we include characters and words that are archaic (or from entirely dead languages), the existence of any amount of text in WG means that we do need it. Certainly on the POS we can only show pinyin, as long as the WG can go somewhere else (which it is). Robert Ullmann 11:09, 10 December 2007 (UTC)[reply]

Robert, I'm trying to understand your point. WG is useful, so we should keep it for the individual character entries, correct? Why is it not useful for compound character entries? For example, 太極／太极 is known in the West as Tai Chi, even though the Pinyin is Tàijí. -- A-cai 12:01, 10 December 2007 (UTC)[reply]

Where did I ever say it wasn't, or might not be, useful for compound entries? There are certainly some that could benefit. But as you say, adding WG is not a pressing need. (as opposed to not removing it) At least at present, someone can look at the characters in 太極 and see where the "tai chi" reading came from. Robert Ullmann 12:13, 10 December 2007 (UTC)[reply]

We can use pinyin as the Wiktionary standard romanization while still listing other romanizations as unlinked "alternative spellings" or such (or full entries if that's what people want). Look at the Korean section of 家族. It lists four romanizations, and all four will show up in search if people come across them somewhere. I think this sort of thing would be helpful to have for Japanese entries, and for the Chinese language family as well. Cynewulf 15:41, 10 December 2007 (UTC)[reply]

If a character has a certain reading or meaning only when part of a compound, these things need to be listed on that character's page as well. There has to be a place to put these things, and the hanzi header is it. The trad/simp parameter can be made into a list, or we can do something like "trad1=蔔|trad2=卜" or something. Cynewulf 17:59, 9 December 2007 (UTC)[reply]

Yes, lots of entries have lists of compounds, and that L4 header goes under Hanzi. And there is Yale, and who knows what else that won't go with a POS. (you don't need trad2 or whatever, it works fine as is; go see) Robert Ullmann 11:09, 10 December 2007 (UTC)[reply]

Category:Illiterate

I have a hard time seeing this as appropriate. It is an extremely biased, not to mention subjective category (nucular is arguable far more "illiterate" than ain't ever was), not to mention some of these verb forms might be dialectal. We have {{proscribed}} (Category:Disputed usage) and {{nonstandard}} (Category:Nonstandard), we don't need this. Any extra thoughts? Circeus 17:09, 10 December 2007 (UTC)[reply]

I don't see the point in it either, just a small not well defined subset of Category:Misspellings. We should go through and remove it from those places where it appears, and then it should be {{delete}}d Conrad.Irwin 18:53, 10 December 2007 (UTC)[reply]

Thought so, but wanted some other opinions. Have to go to class now, but will come back to it later. Circeus 19:04, 10 December 2007 (UTC)[reply]

This category tickles me. ^_^ While it may not be useful, neither are Category:Positive words or Category:Negative words, and we seem to be having a difficult time getting rid of those... Also, how is it that old school manages to be in both of those categories...? — [ ric | opiaterein ] — 01:15, 11 December 2007 (UTC)[reply]

A key difference between Category:Illiterate and Category:Positive words (et al) is that the former came along with a context tag on the sense(s) it/themself/ves. Which made it a much more "blatant" part of the article than the latter. In all honesty, I doubt many *casual* users really even look at the categories list at the bottom. Language Lover 02:30, 11 December 2007 (UTC)[reply]

At least it was ridiculously easy to move all of its content in more appropriate locations. I might have to review quite a few stuff from category:Nonstandard that should be marked with {{context|colloquial}}, {{context|proscribed}}, or both. Circeus 01:24, 11 December 2007 (UTC)[reply]

The classification of ain't as illiterate- or indeed even as nonstandard or colloquial- is an artifact of a thousand self-loathing elementary school English teachers with their panties in a bunch. It *might* qualify as slang, but even that's debatable. It's a word which is avoided by pretentious people who want to sound smart and well-read. No modern descriptivist linguist would raise any complaints against ain't. (On a side note, why was descriptivist deleted??)Language Lover 02:04, 11 December 2007 (UTC)[reply]

Probably a nonsense article at the time. However, as far as the written, non-dialog form of English is concerned, ain't remains at least nonstandard spelling. While I consider myself a descriptivist, the general feeling of the population are important in defining formal and informal levels of written and spoken English, and ain't is definitely standard informal English (or at least disputed informal English). Circeus 02:49, 11 December 2007 (UTC)[reply]

(Because it had no content.) DAVilla 19:39, 12 December 2007 (UTC)[reply]

Your assessment, Language Lover, is precarious. When you say "no modern descriptivist..." you state a falsehood (probably inadvertently.) Being descriptive is precisely what descriptivism is about - it does not mean that all words, not matter what context they are used in, are widely accepted. The notion that tags and labels should be removed seems a bit psychotic. Adding {{unreferenced}} might be acceptable, as each is supposed to identify who proscribes a particular form (and why.) But blind removal of such tags is vandalism. --Connel MacKenzie 20:50, 14 December 2007 (UTC)[reply]

Good thing I never advocated blind removal of anything. I was talking about the special case of ain't. At least in the US, there isn't a living soul older than 5 years old who doesn't know and use the word. And the only reason it would be disallowed from a formal work or speech, would be because of an over pretentious editor who thinks there's something "wrong" with it. A descriptivist analysis of ain't would go like this: "A strange word which everyone uses in every context, and yet at the same time most people claim is illiterate". Now more generally, I'd appreciate it if you'd stop crying vandalism right and left. It's pretty safe to say noone who regularly posts in the Beer Parlour is a vandal. If you continue those kinds of claims I'll escalate it and put it to a vote- is Language Lover a vandal (as per Connel MacKenzie's constant allegations) and should he thus be banned, etc. Language Lover 11:42, 15 December 2007 (UTC)[reply]

Morse code

Would anyone else consider it a good idead to incorporate the Morse code versions of letters into their pages? I would think letters, numbers, and punctuation only. sewnmouthsecret 20:58, 10 December 2007 (UTC)[reply]

They are already in the appendices (see, for example, Appendix:Variations of "b") along with the semaphore, sign language, and signal flags for each. Those templates could just as easily be dropped in the letter entries. bd2412 T 21:01, 10 December 2007 (UTC)[reply]

Gotcha. Would anyone be against me working on that? sewnmouthsecret 21:06, 10 December 2007 (UTC)[reply]

Have at it! bd2412 T 23:40, 10 December 2007 (UTC)[reply]

Perhaps you could start with pneumonoultramicroscopicsilicovolcanoconiosis and supercalifragilisticexpialidocious. Both urgently needed. lol. - Algrif 09:58, 11 December 2007 (UTC)[reply]

There's no need for mockery; I'm just being cautiously helpful. sewnmouthsecret 18:30, 11 December 2007 (UTC)[reply]

My apologies. There was no offence intended. Just having a bit of fun. Please feel free to delete. - Algrif 18:39, 11 December 2007 (UTC)[reply]

I'm not easily offended; just trying to be civil. No worries. sewnmouthsecret 18:41, 11 December 2007 (UTC)[reply]

Colloquial vs. informal

Anybody able to define a proper lexicographical difference between those? I doubt it's possible. Circeus 01:56, 11 December 2007 (UTC)[reply]

I think colloquial is supposed to refer to a specific location, whereas informal is just...general stuff. — [ ric | opiaterein ] — 02:54, 11 December 2007 (UTC)[reply]

Uh-oh, I've been doing it wrong. DCDuring 03:01, 11 December 2007 (UTC)[reply]

No worries; I (and most others) probably have, too :D — [ ric | opiaterein ] — 03:09, 11 December 2007 (UTC)[reply]

Opiaterein/ric, 'Colloquial' does't refer to location, it comes from colloquy or 'conversation', so "colloquial" means something like "pertaining to the spoken language". I double-checked Wiktionary because I hadn't heard of it meaning 'referring to a specific location' and one of the definitions is wrong. Pistachio 03:10, 11 December 2007 (UTC)[reply]

My mistake :D Thanks — [ ric | opiaterein ] — 14:10, 11 December 2007 (UTC)[reply]

Etymology to the rescue. Thanks, P. Boy, when linguists don't get their words right,.... DCDuring 04:52, 11 December 2007 (UTC)[reply]

Really? Oops. Maybe the reason we think that is because the more informal, the less likely it is to be used across different dialects. Dictionary.com states "Informal...describes the ordinary, everyday language of cultivated speakers." Colloquial is close enough to slang that it's often confused with being "incorrect", though incorrectly. DAVilla 19:34, 12 December 2007 (UTC)[reply]

Can we get back on topic? What I'm wondering about is how we justify keeping both {{colloquial}} and {{informal}}. (and associated categories.) Circeus 15:52, 11 December 2007 (UTC)[reply]

Spanish ustedes vs. vosotros, formal vs. informal.

English yes vs. yeah, standard vs. colloquial. That looks about right to me. I think the problem is that we just don't draw any distinction between the two. Formal and informal, I think, should be used at least for terms of address. "A colloquial term of address" sounds really weird to me. The category "colloquial" should probably be a subcategory of informal, if it isn't already. — [ ric | opiaterein ] — 16:33, 11 December 2007 (UTC)[reply]

Trying to put more than one level between "formal" and {{slang}} seems like a bad idea to me. Defining the differences between "slang" and "pejorative" or "colloquial", "nonstandard" and "proscribed" is already complex! Circeus 17:38, 11 December 2007 (UTC)[reply]

A pejorative is (if I remember right) something that can 'hurt someone's feelings', but isn't always used in that way. But that's a bit if. I don't really have a weighty opinion about informal vs. colloquial, but between the two, if we were going to get rid of one, I'd prefer to keep informal.

As Wiktionary Day starts in the Pacific

We have our new Main Page Robert Ullmann 08:51, 11 December 2007 (UTC)[reply]

How exciting!! But, when I look at the main page, I can't see the "file" link by the speaker on the right of the word of the day box, however it appears in Template:WOTD and Wiktionary:Word of the day/December 11. Also the "Refresh" link does not work, the 2024 needs removing from the link. Conrad.Irwin 12:26, 11 December 2007 (UTC)[reply]

Fixed the refresh link. The old design had code in {{audio}} to suppress the file link, I removed it. Both work as designed now. Robert Ullmann 14:23, 11 December 2007 (UTC)[reply]

I find it is a nice uncluttered design.. BRAVO !! GerardM 22:41, 11 December 2007 (UTC)[reply]

I should point out that I am not one of the designers, they get the credit. I just facilitated getting it into place. Robert Ullmann 06:09, 12 December 2007 (UTC)[reply]

Why did the old main page say 403 languages and this one says 216 languages? 70.171.229.76 01:26, 13 December 2007 (UTC)[reply]

The new count only includes languages with more than 10 entries. See Wiktionary:Statistics. Apparently the other 187 are very minor indeed. Conrad.Irwin 01:39, 13 December 2007 (UTC)[reply]

Why set the bar at 10? Why not 5? --EncycloPetey 05:23, 13 December 2007 (UTC)[reply]

Why not 1,000? Hmmm. A couple separate breakdowns might be more meaningful, anyhow. --Connel MacKenzie 05:46, 13 December 2007 (UTC)[reply]

Vote set

On keeping this new design! Wiktionary:Votes/2007-12/New main page design Robert Ullmann 13:34, 21 December 2007 (UTC)[reply]

Prioritizing

The following exchange is copied from Talk:出, please read and share your opinions: -- A-cai 22:46, 11 December 2007 (UTC)[reply]

Let me try another approach. Given that I'm only one person, with a limited amount of time, it is important that I prioritize. How do you think my time here is best spent?

Answering questions that are posted on talk pages?
Cleaning up improperly formatted entries?
Create entries for words that are in the Appendix:HSK list of Mandarin words?
Create entries for words that you have posted in Wiktionary:Requested articles:Chinese?
Create entries for words which are commonly seen, but not included in Appendix:HSK list of Mandarin words?
Add information to the existing 17,000+ single character entries? (regardless of how obscure the character is, or whether it can be displayed by a standard font)
Add useful idioms and proverbs?
Create entries for words that appear in classic works of prose or poetry? (for example, I am creating definitions for each word found in Romance of the Three Kingdoms)

If you can rate each of the above on a scale of 1 to 5 (1 being the most important), it may help me to better understand your needs. -- A-cai 22:09, 11 December 2007 (UTC)[reply]

It's up to you. All those tasks are important. It's built brick by brick, and if it takes weeks or months to get to discussion page queries, so be it. That's what the "zh-attention" tag is for, I guess. Once they're answered, the tag can be removed--whether that's sooner or later.24.93.170.200 22:12, 11 December 2007 (UTC)[reply]

Let's try again. That's not an acceptable answer. Adding information about some obscure character (that only two people in the world care about) cannot possibly be of equal importance to, let's say, creating example sentences for beginning words (I'm sure there are 100s of beginning Mandarin students who would appreciate it). Please, help me to help you. I'm only asking for your opinion, I'm not asking for you to sign a legal contract for services rendered. -- A-cai 22:18, 11 December 2007 (UTC)[reply]

I answered honestly, and will do so again: I think all those things you listed are of great importance. In my opinion, no single element of the language or its written form is less important than another. 24.93.170.200 22:31, 11 December 2007 (UTC)[reply]

Audience is one consideration, true, but adding information about an obscure character is potentially on the same scale of importance since it's something that you actually know how to do, while any Mandarin-speaking jouvenile could add example sentences for beginning words. Do what you feel most passionate about. If you're like me, that could vary widely from day to day or year to year.

No, that doesn't mean they're all equally important, but I don't think ignoring or completely focusing on any one is productive either. As a consequence of diminishing returns, it's good to do some chores now and again. If you keep an eye on formatting then even if you don't do it all yourself, your example could aid others in doing it correctly. On the other hand, if you take too much time responding to talk pages then you create a demand for immediate answers that will drain even more of your time. Don't spend all your time on any particular task. DAVilla 19:24, 12 December 2007 (UTC)[reply]

I'm posting this one at Wiktionary:Beer Parlour. I would like to know if others feel as you do. -- A-cai 22:46, 11 December 2007 (UTC)[reply]

Nobody can tell you what to do. You are a volunteer - do what you want (it all helps). Personally I do different things until I get bored then do something else (lately it has been Italian chemistry & physics terms, but next it will be something else). SemperBlotto 22:52, 11 December 2007 (UTC)[reply]

I think you should do whatever you feel like doing, in a way that's good for you. I wake up and find something to do. Earlier today it was adding words that end in -itate, now it's adding words from this nifty little Swadesh-list. Most of the time I work haphazardly in small projects and without a lot of direction. But it does get stuff added. If I had to pick the three most important things in that list, they'd be cleaning up improperly formatted entries, adding requested words and the idioms. Not so much proverbs, but every language has its strange little sayings and idioms that are particularly useful if you want to communicate well... — [ ric | opiaterein ] — 23:54, 11 December 2007 (UTC)[reply]

Do whatever will improve Wiktionary. Like Semper, Opiaterein (and probably many others here), I vary my personal projects all the time. Sometimes I have cleaned up incorrect L2 headers; sometimes I've created entries for Latin adverbs; sometimes I've added IPA and audio to WOTD entries; sometimes I've correct formatting on entries for names of Constellations; sometimes I've set up ISO-related language templates; sometimes I've just patrolled edits; and once I wrote an Appendix on Spanish pronouns. In doing a variety of tasks, I've kept myself more interested and active than if I had set myself to a specific task that had to be taken to completion before moving on. Other people may work differently, but that is what works for me. --EncycloPetey 01:26, 12 December 2007 (UTC)[reply]

Since A-cai appears to be seeking an affirmative opinion, here's mine:

Create entries for words that are in the Appendix:HSK list of Mandarin words.
Create entries for words which are commonly seen, but not included in Appendix:HSK list of Mandarin words.
Cleaning up improperly formatted entries.

Those are the top three tasks that I think are most important to tackle; the Appendix because it is a recognized official source of words that are important to know; words that you know to be commonly seen because you know them; and cleanup. Old or obscure words and characters would be a lower priority, as would requests (because who knows what motivates people to request words, I've seen plenty of nonsense in the English requests). Cheers! bd2412 T 01:47, 12 December 2007 (UTC)[reply]

You should do what you want ...

The IP "anon" contributor 24.93.170.200 (aka 24.93.190.134, aka Badagnani) has been annoying for a long time, constantly adding little notes demanding that someone else add information; in some cases this might be useful (we do have rfp, rfe etc tags for this sort of request), but on the Han character entries, where only a handful can be said to be "complete", it is fairly pointless. The recent habit of adding entries that are not formatted with the (e.g.) {cmn-noun} template, and adding zh-attention is annoying. Badagnani knows how to use it, as he/she has at least once, and has certainly seen it many times. (I should note here that there isn't any requirement that any inflection template be used in a new entry; but this user knows better.) Tagging it with zh-attention just makes work for someone else, as well as making it a priority it need not be. The idea of the attention cat is to tag existing entries that need fixing, not to make work for someone else. I'd suggest simply removing the tag from any talk page unless there is some unresolved error in the entry; and that user Badagnani (1) log in, and (2) do the formatting him/herself. Robert Ullmann 05:40, 12 December 2007 (UTC)[reply]

Speaking from an unbiased standpoint, it's completely unfair to bag him/her for dropping requests for cleanup when we still have people leaving plain old {{rfc}} everywhere, and a lot of the time without any explanation. </2 cents> — [ ric | opiaterein ] — 05:51, 12 December 2007 (UTC)[reply]

I'm not talking about tagging existing entries with errors; I'm talking about creating new entries not formatted, and tagging them for someone else to clean up. What would you think of a user that steadily added new entries (into the hundreds), not bothering to format anything, but just routinely adding {rfc} to every one? Robert Ullmann 05:56, 12 December 2007 (UTC)[reply]

For example, look at this edit cleaning up an entry an hour ago. Note that I don't know the word, and just formatted what was already in the entry. (I did look up the radical/stroke in our entry 板) Keep in mind that this user has been here for two years, with lots of edits. Robert Ullmann 06:04, 12 December 2007 (UTC)[reply]

One word: T-Bot. Granted, the entries are formatted correctly. But they're still bleak and crappy. :p — [ ric | opiaterein ] — 13:40, 12 December 2007 (UTC)[reply]

They are formatted correctly. And they are minimal, but it(I) is doing the best it can, not being wilfully sloppy. And they are in their own cats, not flooded into cleanup cats that should be kept clear if possible. Robert Ullmann 13:51, 12 December 2007 (UTC) oh, and since they were generated, if bad they can be deleted out-of-hand, unlike a user contribution which must be cleaned up or (sometimes) RfD'd. (Tbot won't whine on your talk page that you deleted the entry it is so proud of ;-) Robert Ullmann 13:54, 12 December 2007 (UTC)[reply]

Haha nice points about the whining, but it still doesn't change the fact that Tbot still sucks at making entries. I can see how it might be useful for languages that don't get many contributors and such, but on a large scale...not so much.

Anyway, better in the 'attention needed' category than no category at all. I still find English entries with no categorization. :( Makes me sad. — [ ric | opiaterein ] — 14:34, 12 December 2007 (UTC)[reply]

Minimal entries are better than no entry; and the way wikis work is that one person/bot creates an entry, and other people/bots add, expand, improve it. Creating "complete" entries from scratch is not typical. Note that unlike the subject of this section, Tbot has stopped to absorb all of the feedback, and will be doing better. You are still going to whine that the entries are minimal, so expand them or ignore them; meanwhile other people can use them. (oh, and a note to others: don't tell him about Special:Uncategorizedpages, he might not recover ;-) Robert Ullmann 07:42, 13 December 2007 (UTC)[reply]

I agree completely with Robert here, and I am grateful for the great work done by Tbot. The philosophy that minimal is better than nothing is the philosophy of how I edit, usually. Like Tbot (but hopefully more intelligently) I sometimes try to enter high volumes of new articles with simple definitions, since, while I could spend lots of time on each one researching etymology and quotations and related words and such, I think it is more productive to lay the groundwork and try to fill in all the holes first, and then take a more nuanced approach to the existing articles once we've done that. In my opinion, our languages are at different stages of this process, and so it would make more sense to (as we are) be more concerned about pronunciations and etymologies and other parts for English articles, than it might be for Thai. Note that this doesn't mean you shouldn't contribute however you like best (whatever makes you happy and keeps you here is the best for the project). Dmcdevit·t 08:32, 13 December 2007 (UTC)[reply]

Robert, thanks for attempting to clean up the article. Unfortunately, this type of work usually requires someone who is fluent in the language (now look at the edit). As I explained to 24.93.170.200 in a previous post, I would rather have someone post a note at Wiktionary:Requested articles:Chinese. In this case, I know that 24.93.170.200 does not speak Mandarin with any degree of fluency. That means, in addition to checking all of his formatting, I have to check his definition for accuracy as well (especially since a large number of them are rather obscure terms). I actually don't mind him using the {{zh-attention}} tag; at least then, I know that the entry requires my attention. But creating poorly formatted entries (rather than posting a note at Wiktionary:Requested articles:Chinese) of mostly obscure musical terms, without any proper citations, means that I feel pressure to clean up the entries. The reason that I feel pressure to clean up the entries is that I don't think we will attract true language experts if they only see a bunch of shoddy entries. -- A-cai 08:31, 12 December 2007 (UTC)[reply]

May I kick myself now? I had written {{cmn-noun|ts|... but then got distracted by 24.93.170.200 having put it only in a simplified topic cat, and for some reson changed it to "s". Duh. (kick, ouch!) 13:51, 12 December 2007 (UTC)

One more opinion: I think the Romance of the Three Kingdoms is important. If A-cai can add words that are found nowhere else , it will attract Chinese scholars to visit this site and some of them may stay to edit. Shoddy edits can actually act as an inspiration. Why is it so dangerous if a clean-up request stays in an entry for years? All the non-English sections of this dictionary are unsatisfactory anyway, and will be for a long time. It's a waste of talent if A-cai spends all his time cleaning up other people's mess.--Makaokalani 15:24, 12 December 2007 (UTC)[reply]

That may be true, but I think more people will be attracted to this website by translations of commonly used words and phrases. bd2412 T 22:17, 12 December 2007 (UTC)[reply]

Yeah, nobody wants to use a dictionary that doesn't even have basic words. That has to be one of the most annoying things in the world. — [ ric | opiaterein ] — 02:59, 13 December 2007 (UTC)[reply]

It's why I seldom visit es.wikt or la.wikt. Sad to say, but they're missing more basic words than they have defined. --EncycloPetey 05:22, 13 December 2007 (UTC)[reply]

To address your question: I find your fabulous idiom entries enormously enlightening and entertaining. They are a joy to come across at, here (or rather, here.) --Connel MacKenzie 05:56, 13 December 2007 (UTC)[reply]

Was that supposed to be a joke? If so, I don't get it. — [ ric ] opiaterein — 18:10, 13 December 2007 (UTC)[reply]

My repsonse to User:A-cai's original question was not a joke at all. His work on Category:Mandarin proverbs (for example) is what all contributors should strive to match. --Connel MacKenzie 20:41, 14 December 2007 (UTC)[reply]

A Help index

I have mocked up a very brief and temporary [~~Help index~~] and would like views. On too many occasions I have been unable to locate help/guidance on a specific part of a Entry, this invariably (with me) leads to a fudged piece which someone will probably have to correct later. Help is available on most things but finding it in the time available is often impossible. I would like to assemble a proper (is someone going to say there already is one?) alphabetical index for Help. Beside any obvious advantages it will also show up holes in or duplication of material. The major downside would be, of course, keeping it up to date. —Saltmarsh^Talk 13:33, 13 December 2007 (UTC)[reply]

WT:ELE didn't help at all? — [ ric | opiaterein ] — 13:42, 13 December 2007 (UTC)[reply]

There is an alphabetical index for all pages in the Help: namespace but things like ELE are in the Wiktionary namespace. I definitely think that something like this would be useful, but I'm not sure what would be included? Conrad.Irwin 18:05, 13 December 2007 (UTC)[reply]

The Help: namespace is the most neglected namespace here. Anything you can do to improve it is welcome. Perhaps just announce here on WT:BP, each new page you add there, asking for comments? --Connel MacKenzie 20:26, 14 December 2007 (UTC)[reply]

One good example of how precarious your index is, is the "Spelling" stuff. Almost all of what Richardb wrote (that draft policy page) has been repeatedly rejected. That one page in particular, (confer the talk page) is an excellent example of someone slapping together their fantasy, despite being opposite of common practice, with a "draft policy" tag atop. --Connel MacKenzie 20:37, 14 December 2007 (UTC)[reply]

(1) Thanks - I shall start a Help:Index

(2) Connel's point is another reason for searching out forgotten pages which may mislead the innocent, shouldn't the banner be changed to reflect this? —Saltmarsh^Talk 07:30, 15 December 2007 (UTC)[reply]

Help:Index is underconstruction - please comment on the structure/appearance/format. Thank you. —Saltmarsh^Talk 15:58, 15 December 2007 (UTC)[reply]

This is very good!!. There is lots of stuff out there, some more or less dubious, as CM points out. Just finding it and listing it will help. (I'm still finding things I didn't know about; I don't recall ever seeing that spellings page before ;-). I redirected Index:Help to Help:Index as that is likely to happen ;-) ;-) Being able to look up headers for example and get directed to the obscure place in ELE or subpages or ... would be very useful. Bravo! Robert Ullmann 09:27, 16 December 2007 (UTC)[reply]

I looked at the the Help Index, it's a great idea. It will definitely speed up finding information. I am new to Wiktionary and it took me a lot of time to find the things I was looking for. Just one comment. I searched the Help page and found no items related to translations. I know there is information out there, maybe it is embedded in other pages. Should translations be listed? Thanks. Panda10 01:00, 20 December 2007 (UTC)[reply]

They will be - i've only got as far as "E"! —Saltmarsh^Talk 07:24, 21 December 2007 (UTC)[reply]

Tbot entries

As you may have noted, User:Tbot has been creating some new entries as it works on updating the {{t}} templates. In November, it generated several hundred, and has gotten feedback on the kinds of mistakes it was making based on bad (or ill-formed) translations table entries.

I have used that to develop a new algorithm, and it is doing very much better. Most or all of the problem cases will not occur. The entries now are almost certainly useful and correct (as far as they go).

Tbot entries go in sub-categories of Category:Tbot entries, each one in two subcats, one by month created, one by language. The newer entries are in Category:Tbot entries December 2007, the older ones in Category:Tbot entries November 2007. I will probably delete all of the (then remaining) entries in the November cat at some point.

Please look and comment here or at User talk:Tbot. Thank you, Robert Ullmann 09:13, 14 December 2007 (UTC)[reply]

That's a lot of stuff to have made to just delete it. lol — [ ric ] opiaterein — 11:44, 14 December 2007 (UTC)[reply]

(is this worth replying to?) so first you whine that the entries are too minimal, and then you laugh at the idea of stripping the older ones so they can be re-created (if they match the new algorithm) in better shape. Jeez. Is a process. Robert Ullmann 04:39, 15 December 2007 (UTC)[reply]

I wasn't irritated at the fact that they were minimal, I was irritated that so many of them had inaccuracies. I'm laughing at the idea of stripping the older ones, because there are so many of them. I think 10 entries per language on the old algorithm would've been fine to see how the bot would work in practice. I'm saying it was a large scale experiment just to be improved so much less than a few weeks later. — [ ric ] opiaterein — 14:01, 15 December 2007 (UTC)[reply]

You "[were]n't irritated at the fact that they were minimal"? Really? Did you write "Ugh you're killing me dude :P I didn't say they needed to be started as complete entries, just that they should have a little more, hopefully, than the crap minimum" on Tbot's talk page, or has your login account been compromised? ;-)

The number created was just about adequate to find the problem patterns; it took different numbers for different languages. (Doing a random set, rather than the first n in DB order would have helped.) The new things in the last few days were not things I was thinking of 3 weeks ago, it turns out to be not very hard to do several interesting things, and solve all of the problems from before. But didn't know that then. In any case, stripping older entries and recreating some is bot work; and it doesn't get tired. So no big deal. Robert Ullmann 08:57, 16 December 2007 (UTC)[reply]

Particularly in the Romanian entries, there was more to it than them being at "the crap minimum", although I do still think there should be more than that :p. The definitions weren't always spot-on, and half the time they had stuff in the parentheses that either wasn't needed, was wrong, or in one case just repeated the definition. I'm probably not ever going to be too keen on a bot starting entries. :p

Also, going through a couple of the Armenian entries, I noticed that the transilerations, while not being wrong, didn't use the ISO "internationally accepted standard" thing. So it appears that we're letting the bot go "against" Wiktionary standards if there happens to be a non-standard transileration for Armenian, Russian, whatever doesn't use the Latin alphabet. — [ ric ] opiaterein — 12:50, 16 December 2007 (UTC)[reply]

I don't see you fixing the redundant glosses in translations sections? I don't see you out fixing the Armenian transliterations in the source entries? Or do you imagine the bot is inventing them? Want to remove all the entries that have "non-standard" transliterations, you'll have to trash half the non-Latin-script references in the wikt. The definitions can not be "spot on" until a native speaker of the FL who is fluent in English edits the entry. But then that is also true of half of the wikt, where a very large number of FL entries are made by people learning the language in question. But they can be, and almost always are, pretty good, and useful.

As you say, it has become entirely clear that no matter how well it does (and it is already better than the typical IP-anon or occasional editor), you will find something to whine about. :-) Robert Ullmann 13:21, 16 December 2007 (UTC)[reply]

Note that we don't have a standard for Armenian transliteration (we don't automatically use an ISO standard), it would be at Wiktionary:About Armenian. And as you note, they aren't wrong. Robert Ullmann 13:40, 16 December 2007 (UTC)[reply]

I don't have the time or patience to go through wiktionary fixing every Romanian translation, or every Armenian transileration. This is a pointless discussion. I'm just going to get irritated. — [ ric ] opiaterein — 14:57, 16 December 2007 (UTC)[reply]

See fêkî. And owoc too. Robert Ullmann 07:39, 15 December 2007 (UTC)[reply]

Also see kopalnia and головастик ... Robert Ullmann 09:41, 15 December 2007 (UTC)[reply]

And śliwka. Happy now? ;-) Robert Ullmann 10:04, 15 December 2007 (UTC)[reply]

I updated the documenation a bit, to reflect the present status. Robert Ullmann 13:21, 16 December 2007 (UTC)[reply]

Vote set

At Wiktionary:Votes/bt-2007-12/User:Tbot creating FL entries as promised in the Tbot flag vote. Robert Ullmann 08:49, 17 December 2007 (UTC)[reply]

Merge Category:Colloquial and Category:Informal

Re-started with clearer title because the original thread is pretty much dead.

I fail to see how we can possibly make a reasonably clear distinction between these levels, and so I firmly believe these categories and the associate templates ought to be merged. Having two levels of distinction between "normal formal written" and "slang" just doesn't work. Circeus 15:42, 14 December 2007 (UTC)[reply]

Well-defined categories could work if someone had a proposal that made logical sense and accommodated the existing tags and categories. Documented tags and categories could be very useful. Has anyone actually analyzed the use of the tags? How many of each? Any apparent pattern to the use? How many words that ought to have such tags don't? What harm comes from something being labeled informal instead of colloguial or vice versa? I fail to see any benefit from eliminating categories and tags that may have made sense at one time to someone and could be useful again. OTOH, maybe the benefit just hasn't been explained. DCDuring 18:00, 14 December 2007 (UTC)[reply]

There is annoying tendency to dump anything that is more {{jocular}}, but not stritcly-peaking {{slang}}, into {{colloquial}} (e.g. blondie, asperge), rather than {{informal}}, but that doesn't make it right. The translation given for "asperge", "beanpole", is "informal", and we have the Australian colloquialism "barbie" and "beaut" in separate categories, this makes no sense! Even better: bellybutton? informal. belly button? colloquial! Circeus 20:03, 14 December 2007 (UTC)[reply]

It sounds to me, like you've outlined a plan for yourself to traverse both categories, seeking out (and correcting) inconsistencies. But I don't see any rationale at all for combining these two distinct categories. But specific inconsistencies are just that - specific. I disagree with your assessment about "blondie" being informal - colloquial is more accurate. The French colloquialism is quite different from French informal...so that too seems accurately labeled as "colloquial." For bellybutton/belly button I wonder why two separate UK contributors don't think it is a real, formal term (nor why they used different labels for it) but if any label should be on them, "colloquial" would be it.

The tag "informal" implies there is a "formal" equivalent. The tag "colloquial" does not...but does express both the likely context of use and often a hint at the term's origin as well. Blindly merging the two would be a serious step backwards. Specific errors may exist - those can really only be dealt with one-by-one. --Connel MacKenzie 20:21, 14 December 2007 (UTC)[reply]

But what's the difference between the two classifications? Our own entries for informal and colloquial say:

(of language) Reflecting everyday, non-ceremonious usage.

and

(linguistics) Denoting a manner of speaking or writing that is characteristic of familiar conversation; informal.

respectively. Those seem pretty close to me; the latter directly refers to the former for the definition! There's no mention of the features you claim in your above post. Even if there were, I doubt many editors painstakingly consider etymology and such when choosing whether to tag a word colloquial or informal. Let's merge these categories. Language Lover 11:24, 15 December 2007 (UTC)[reply]

Merge them because you don't see a distinction that others obviously do? Um, no. --Connel MacKenzie 04:49, 16 December 2007 (UTC)[reply]

Maybe you could do us a favor and update our entries for colloquial and informal so that these distinctions are visible. If an editor doesn't know the distinctions, we shouldn't expect the whole userbase to know them. Til then I still say merge them. As for "a distinction that others obviously do [see]", noone has yet been able to articulate this distinction and its obvious you're exagerating when you say that. Language Lover 10:41, 16 December 2007 (UTC)[reply]

"because you don't see a distinction that others obviously do?" Actually, that makes two of us so far, and looking at the categories provides pretty damning evidence a great many users have employed interchangeably, as I pointed out. So far, it's more like you don't want the change to proceed because you see a well-defined difference were we don't.

The problem is definition: "informal" is almost always defined as a variant of "not formal", whereas colloquial is defined by relating it to oral conversation, so that there are no points of comparison in the definitions to relate them to each other and properly contrast them.

Examples of confusion: [1], [2], [3]. Here is a very good explanation from w:David Crystal of how "colloquial" came to be used in the first place, and how it's been going out of use in English lexicography. Circeus 21:35, Circeus 21:35, 16 December 2007 (UTC)[reply]

Since this is a continuation discussion, it seems decpetive of you to say that my statements stand individually - particularly when they only echo someone else's comments in this section. I, like you, count two people in favor of merging. Unlike your count, there seems to be quite a few more not in favor.

All that aside, explain to me how the informal tu vs. formal vous is used in this way in English. There is a rare "royalty" comparison that doesn't have the same mechanics...but other than that, doesn't exist in English. On the other hand, there is a glaring distinction (poorly defined, to be sure) between slang and colloquial. When something is a shade of slang, then, I'd choose colloquial. When something is an informal term only in deference to a formal term, I'd use the informal and formal tags. Those are different arenas. I still don't understand how you can propose a blanket (blind) merge of the two. The two flavors (colloquial vs. informal) are very different, in my eyes. You (plural) haven't made a compelling case that demonstrates how they could reasonably be considered the same.

--Connel MacKenzie 05:27, 19 December 2007 (UTC)[reply]

(Undent) Actually, the way I read it, neither Opiaterein nor DCDuring actually disagreed with the proposal. They only confirmed that our category are at best fuzzy, at worst not defined at all, without voicing any opinion regarding the suggested merge.
This is where we have an issue: we need a well-defined criterion that can be used reliably by any user to set apart "informal" and "colloquial" first from each other, and then from "slang" and our otherwise unmarked "formal". In the absence of such a criterion (which, I should remark, you have not even tried to provide), we should merge at the very least the categories, as they are about as pointless as having an "indigo" and a "dark blue" category for shades of blue. Circeus 06:15, 19 December 2007 (UTC)[reply]

I don't know where you got the idea that (generally speaking) criterion are well defined on Wiktionary. While there has been a steady trend in that direction, with the most helpful intentions, it has resulted in enormous instruction creep overall. (For example, WT:CFI used to be simply "All words in all languages" with no dedicated page of its own.)

That aside, I certainly did try to describe the difference above. In English, when discussing the linguistic aspects of romance languages, formal and informal are the terms most often used. When discussing English terms, {{informal}} should usually be replaced by {{colloquial}}. But any native English speaker should immediately be able to pick out which remaining English terms are not colloquial but are informal. Spelling out the exact circumstances would be difficult, with no plausible gain. There still would be separate categories.

--Connel MacKenzie 08:24, 21 December 2007 (UTC)[reply]

No gain? No gain in actually spelling out the differences in the two categories you maintain are necessary? Consciously leaving such unabashed ambiguity and esoteric exclusivity is a major detrement to Wiktionary, and I wish categories, tags, templates, policies, etc. were visible, accessible, and decipherable here. As it is, these same discussions happen over and over on various policy points, no one can actually cite precedent because its lost in a dozen unresolved discussions.

Can this thread please go somewhere! The mealy-mouthedness is exhausting. Connel, you actually did define the differences, and did so before the others claimed you didn't:

The tag "informal" implies there is a "formal" equivalent. The tag "colloquial" does not.
"colloquial" expresses both the likely context of use and often a hint at the term's origin as well (arose via conversational English).

Let's please go from there, resolve this, and canonize it so it can be understood, referred to, and continually revised. Thisis0 18:00, 21 December 2007 (UTC)[reply]

These elements are certainly not enough to say "this is not colloquial, this is informal" or vice versa. "Colloquial" is currently a subset of "informal", but our application of tehse tags is so haphazard as to be completely useless, and I doubt going through these 1000+ each categories for establishing consistency will help, or be in any way feasible. I've pointed out above that the distinction is phasing out of mainstream lexicography for exactly these reasons, and is mostly a relic of a biased 18th century philosophy of language.

Given that informal writing will include "colloquial" forms (because colloquial is by definition a subset of "informal"), trying to maintain any sort of distinction is pointless. besides, No other language on Wiktionary uses both tags, all uses only one, which further demonstrate that it is not necessary. Circeus 20:00, 21 December 2007 (UTC)[reply]

Discussion with you will continue to circumnavigate in some kind of circular circus as long as you continue to ignore what the other party is saying. As has now been repeated, there is indeed enough distinction given here to say "this is not colloquial, this is informal". Once again, a term is not labelled informal unless there is a clear formal equivalent. Examples: (deprecated template usage) belly button is informal (navel); (deprecated template usage) lots is informal (a lot, many). (deprecated template usage) Beaut, (deprecated template usage) gotta, (deprecated template usage) gonna, (deprecated template usage) wanna, and (deprecated template usage) for all intensive purposes are colloquial, because they came about through speech. In all honesty, either {{colloquial}} or {{informal}} would be appropriate for any of these, and it is truly subjective, but like Connel says, it's a "flavor" -- and an option that should not be lost, but rather expounded upon and delineated. There are distinct implications given by each of these tags ('implied formal equivalent' vs. 'chatty speech English'), and obviously the tags aren't applied consistently here. Additionally, the one article written about an observation that some dictionaries now are using only "informal" does not imply "the distinction is phasing out of mainstream lexicography". You're forgetting that print dictionaries have limited space. We do not. We also are not using the term in the "biased 18th century philosophy of language" prescriptive way. We are using it to categorize language and give some idea to what avenue the word entered use, and how it is mostly used. -- Thisis0 23:23, 21 December 2007 (UTC)[reply]

"Colloquial" has all sorts of problem, and virtually nobody has been able to bild a cogent way to distinguish it from "informal," as I pointed out above. Crystal comments in one of the links I give: "Informal is also wider in its application, being equally at ease in relation to both speech and writing. Colloquial is very much bound up with speech: could one talk of 'colloquial writing'?" "Colloquial" are obviously in use in print, or else we wouldn't be including them in this dictionary would we? Then how we be so bold as to ay they are speech-only forms? Circeus 01:40, 22 December 2007 (UTC)[reply]

Dude... we totally aren't saying that. That lame article is. I'm really done with this if you keep going in circles. Colloquial here categorizes any word, written or spoken, that is standard and not slang, and likely arose via casual conversational English, and is likely to be used primarily in casual conversation rather than in more formal written works, speeches, and discourse. As you pointed out, ("informal is wider in its application"), -- colloquial is something of a subset of informal, and the two terms are arguably interchangeable on some level. However, as has also been pointed out, there are some cases where colloquial is appropriate, but informal would not be. Hence, we preserve here the distinction and the unique implication given by each tag. -- Thisis0 14:01, 22 December 2007 (UTC)[reply]

Include etymology work made by the Youtube "hotforwords" woman?

Do you think it is a good idea to include the etymology work made by the Youtube "hotforwords" woman? We could ask her to release her videos under the GFDL license and then include her videos on Wiktionary. You can watch this video to see how she explains the etymology of the word "oxymoron": Oxymoron. She has made 72 videos to date and seems to keep making new ones. Tommy 22:39, 14 December 2007 (UTC)[reply]

It might bring people to Wiktionary, but I am not sure if it would improve the quality to any great extent. It is just a little too different from your standard dictionary. Still getting content released under GFDL is a bonus (in general) anyway, even if we then never use it. Conrad.Irwin 11:17, 15 December 2007 (UTC)[reply]

But as an online, wiki dictionary, we are not taking advantages of our medium anyway! We we have done was mostly to replicate the structure of traditional dictionaries! At least including videos would make wiktionary a truly unique resources. Circeus 16:55, 15 December 2007 (UTC)[reply]

After watching the video for Oxymoron...no. :| The videos are kinda silly and I don't think they'd add anything worth adding that can't be said in a regular etymology. Cute, but not anything special, necessary or overly enriching. — [ ric ] opiaterein — 18:17, 15 December 2007 (UTC)[reply]

Do those videos look amateurish to you? Or rather, does it look like a very-well funded endeavor? Is there some reason, you think, that she often flaunts OED publications? When not pictured, each video I've seen has at least referred to an OED publication. Good luck getting that corporate entity to release it under the GFDL. What would be nice, would be if she (erm, her team) could send an admin here an e-mail a few hours before releasing each video, so the various pages can be semi-protected for a week or so. --Connel MacKenzie 04:19, 16 December 2007 (UTC)[reply]

No. We should be making at least an attempt to look like a serious dictionary. [cue a number of people falling off chairs] These (well, I watched part of one) are very well done, but ... I was thinking we might link to youtube, as some people might appreciate them, but you can't do that with getting straight porno on the linked to page. (other videos you might like), along with a lot of adverts of course. Definitely not import them, so GFDL etc is moot. Robert Ullmann 13:32, 16 December 2007 (UTC)[reply]

Alternative cases for internet slang

How are we supposed to format internet slang abbreviations. ttfn is at the moment a redirect to TTFN, which I thought would have been frowned upon. Although Wiktionary seems to put these in at the uppercase version OMG not omg (which relies on the Javascript redirects) I think it is more standard for many 'words' to see the lower-case ones, e.g. lolz (which has a slightly different entry at LOLZ); though there are some that are still more common in upper-case IMHO for example. The difference in case can change the shades of meaning slightly, and it would be a guffaw indeed to warrant a LOL over the chuckle indicated by lol :). Has this been discussed before, or do we need to decide on a standard way of doing these. Conrad.Irwin 10:53, 16 December 2007 (UTC)[reply]

It's always just been assumed that acronyms are properly in the uppercase. I'm not sure that a distinction between LOL and lol could be documented (you know, in durable sources) and even if it could, it follows the informal rule of highlighting words with uppercase (as with italic or boldface), e.g. "No, thank you" vs. "No, thank YOU".

Perhaps a soft redirect would be good enough for now? DAVilla 06:34, 25 December 2007 (UTC)[reply]

Translations sections in FL entries

One thing I've noticed in the last few entries checking on dozens of Tbot entries is that the link to the FL.wikt entry in {{tbot entry}} is very useful. But of course that template is only temporarily in the entry. Someone on IRC made the same observation, suggesting we put the link somewhere in the body of the entry. Yes, there is an iwiki, but that is in a list on the LHS (in Monobook) and not as easy to use (especially if you don't happen to know the name of the language in the language ;-)

I was also thinking about the fact that we don't have translations tables in FL entries (of course!), we can't transclude them cross-wikt, and trying to copy them and keep them updated would be crazy. But linking to them would be very useful: it tells the user where to find that information (which is not at all obvious!), and strengthens the links between the wikts.

Putting those two things together: see mokry.

What do you think? Robert Ullmann 10:01, 17 December 2007 (UTC)[reply]

We already have a link to it in the translation section, and I think having it in the article would be even more beneficial. It would be useful for more things that just Translations so I don't think the link should go there. Could it go in the inflection line somewhere? Also is there a way to get the iwiki links in the sidebar to show that language name in English (or both like we do on Wiktionary:Main Page#Wiktionaries in other languages)? That might be a nice option for newbies, since it would make pages more comprehensible. --Bequw 17:09, 17 December 2007 (UTC)[reply]

Oh, they've already got that script w:User:Tra#Sidebar translator, and it looks nice (my usage). --Bequw → ¢ • τ 22:12, 19 December 2007 (UTC)[reply]

On the Romanian wiktionary, there's always a link to the native language's entry. It goes right under the level 2 language header. Check out ro:biti to see what I mean; it has 4 examples. — [ ric ] opiaterein — 17:48, 17 December 2007 (UTC)[reply]

I thought I already replied to this... I must be being stalked by oversight again </fun> I think that we should include these links, that the link text should be a bracketed ISO 639 code (fr) in the inflection line, or bracketed language name [[:fr:français|]], on their own line (like the Romanian wiktionary). I prefer including them in the inflection line, and think we should update the inflection line templates to include the FL wiktionary link before the first open bracket (or at the end).

mokry (pl)

amo (la) (infinitive amare, perfect amavi supine amatus)

Does this look ugly, make sense, other? Conrad.Irwin 00:02, 25 December 2007 (UTC)[reply]

Strangely placed, I think. If we were to do here what the Romanian wiktionary does, it'd look something like

Romanian

română

Noun

cafenea f

Beer parlour

which I think works pretty well. You could do a lot of things with it, but I think the best thing is to keep it near the level 2 language header. — [ ric ] opiaterein — 00:52, 25 December 2007 (UTC)[reply]

If we were to do that it would be nice to have a template that did this for us, just bung in {{head|ro}} and out pops "==={{subst:language|ro}}<small>[[:ro:{{subst:PAGENAME}}|({{subst:#language:ro}})]]</small>===" or words to that effect that do as you have done above.

Romanian (Română)

Incidentally I think we should still include the link to the foreign language wiiktionary even if we don't know the word exists, but it would be nice to colour it red. Conrad.Irwin 02:56, 26 December 2007 (UTC)[reply]

I was thinking about that, but I wasn't sure how well it would go over. The French wiktionary uses templates with a format like {{=fr=}}. It'd be a big change, but I think it would be cool. — [ ric ] opiaterein — 19:41, 26 December 2007 (UTC)[reply]

(clear indent) Well, I think it is easiest if we only have one template that does all of the headings. To this end, I created {{==}} which can be used, {{==|ro}} or {{==|ro|-}} or {{==|ro|+}} in much the same way as t+ and t-. The result is below I wonder what would happen if we tried proposing something like this... The template 'must' be substituted, much as I dislike that, as otherwise the edit link wouldn't work, So it may be less easy to use than I hoped. Conrad.Irwin 20:56, 26 December 2007 (UTC)[reply]

English {{subst:==}}	Lingala (Lingála) {{subst:==\|ln}}	{{subst:zh}} (中文) {{subst:==\|zh\|+}}	Korean {{subst:==\|ko\|-}}

The thing I don't like about the + - parameters is that they don't synch up automatically with the linked wiktionary. If they did, I'd be cool with it. — [ ric ] opiaterein — 22:36, 26 December 2007 (UTC)[reply]

Well perhaps we could ask our friend Mr. Ullmann, if his Tbot could be extended to add these links, that would be easier than the template idea. Conrad.Irwin 00:56, 27 December 2007 (UTC)[reply]

Note that you've seriously broken this page ... it will take a few edits to fix it, as there are now a number of extra L2 sections.

You can't change the language headers. Period. Too much software uses them, and we don't need this magic.

As I pointed out at the top, it would be useful to add this in a Translations section, which is otherwise "missing". If you try to add it pretty much anywhere else, you will break things left and right. If it is worthwhile, that might be okay; but as the link already exists as an interwiki, which is already automatically maintained I don't think that is a good idea at all. Robert Ullmann 11:53, 28 December 2007 (UTC)[reply]

Okay, took me 5 section edits to clean that up. This is all very bad. If we want another link somewhere it should be within the existing format, as I originally suggested. Robert Ullmann 11:59, 28 December 2007 (UTC)[reply]

Robert, the bots don't have feelings. You can go back and fix them. But that might just be too much work. :) — [ ric ] opiaterein — 12:08, 31 December 2007 (UTC)[reply]

Point taken though, sorry for messing up the page. Also with such templates, the [edit] links don't work. I would however like to see something like the Romanian Wiktionary, as those links could be useful to someone who speaks both. I quite like my idea of putting it within the inflection line, way up top somewhere. The problem with relying on the interwiki links is that, in theory, every Wiktionary should have every word, so the interwiki links are going to be hard to sort through.

Work in progress - draft documents

Not for the first time, I have come across a Marie Celeste of a page, abandoned months before (not usually in mid-sentence!) by a busy editor - and forgotten! Should we not have a category for all "Work in progress" and "draft" documents so that some check can be kept? —Saltmarsh^Talk 12:25, 17 December 2007 (UTC)[reply]

No. All of Wiktionary is a work in progress. — V-ball 16:04, 18 December 2007 (UTC)[reply]

This is true - but a Help page marked as "in progress" but untouched for 6 months has obviously been forgotten - and if it's in a rarely visited area it will remain so. Too many pages like this give a bad impression. (I may of course be guilty myself!) —Saltmarsh^Talk 07:29, 21 December 2007 (UTC)[reply]

Would a category actually help? Maybe there's an easy way to get the oldest pages in the Wiktionary: namespace? Without digging through the XML dump, that is. --Connel MacKenzie 08:07, 21 December 2007 (UTC)[reply]

Wouldn't be too hard to get something like the oldest pages out of the XML, it doesn't have creation dates, but as it is in database ID order, that isn't a problem (except for redirects within the namespace). Just looking through special:allpages should be reasonable; pretty much everything there should be referenced for the help index one way or t'other. (not counting subpages and redirects) Robert Ullmann 13:29, 21 December 2007 (UTC)[reply]

But unless half-finished pages were flagged in some way ... —Saltmarsh^Talk 06:25, 22 December 2007 (UTC)[reply]

Perhaps they should just be RFD'ed when found. --Connel MacKenzie 19:34, 23 December 2007 (UTC)[reply]

.....or finished, maybe? :p — [ ric ] opiaterein — 17:02, 29 December 2007 (UTC)[reply]

Declension and conjugation tables

I have searched the archives and cannot find a discussion about this. I hope I searched well enough. If this has been discussed before, please point me toward it.

Recently, Daniel Polansky and I have been having a discussion because he made wholesale changes to the declension tables for Czech nouns. The discussion is at http://en.wiktionary.org/wiki/User_talk:Daniel_Polansky#Changes_to_cs-noun-prep_template. In there you can see the two differing lines of thought. Being a fan of consistency, I was wondering if there were other thoughts on the matter here in the Wiktionary community so we could get something somewhat standardized. I recognize that each language is different, but standardizing the foldability and the use of colors or not might be a good start — especially foldability. — V-ball 15:56, 18 December 2007 (UTC)[reply]

I have no particular opinion regarding what you call folding on that talk page. Regarding colors, though, I'm all for consistency and all against garishness. It looks more professional that way, in my opinion, and is easier to read. (This, despite the fact that I wrote garish conjugation tables myself (e.g.).) Greys (or other muted colors) all the way, using prettytable or what-have-you.—msh210℠ 20:58, 18 December 2007 (UTC)[reply]

Either color scheme looks fine to me, but I've been told that I don't always have the best color sense either. Personally, I dislike collapsible tables, except in cases where the table is really long, such as most verb conjugation tables or lengthy tranlsation tables and the like, because those tend to overwhelm the rest of the article. For noun declension tables, I'd rather have the whole (small) table visible in the article. --EncycloPetey 02:40, 19 December 2007 (UTC)[reply]

I'm sure I've said before, but I suck at coding so I'm not sure how they do it, but Wikipedia has their foldables coded so that they're only folded automatically if there's more than one on a page or something. — [ ric ] opiaterein — 13:31, 19 December 2007 (UTC)[reply]

I made a proposal on the grease pit a while back to allow for editors to mark short folding tables as "open", which I think would be the best solution to this problem. Conrad.Irwin 00:52, 27 December 2007 (UTC)[reply]

Abbreviation POS header

Although the rule doesn't seem to be written anywhere, the standard, the way I understand it, is that we do not use the Phrase POS header where another does as well; for example, if a phrase is a verb, then we list it as a Verb, not as a Phrase. I assume (though I haven't seen anyone say this) that the logic is as follows: It's obviously a phrase (count the number of words, and see that there's more than one), so use the Verb header to show that it's also a verb (which is not as obvious).

The same would seem to apply to Abbreviation. If something is an abbreviation, but also a noun, then we should list it as a Noun, and put {{abbreviation of}} (or similar) in its definition line or Etymology. That abbreviation info will suffice for people to know it's an abbreviation; and the Noun POS header will give the non-obvious info that readers need: this is almost completely analogous to the phrase case.

This is the way I've been doing it (see, e.g., m.m.), and it seems the most reasonable way to me. What think you all?—msh210℠ 18:13, 18 December 2007 (UTC)[reply]

Makes sense to me. "Abbreviation" has always seemed odd to me as a POS header. Rod (A. Smith) 19:06, 18 December 2007 (UTC)[reply]

Agreed. --Bequw → ¢ • τ 13:19, 19 December 2007 (UTC)[reply]

We've tended to use Abbreviation as a "POS header" with the understanding that it is not actually a POS. We have had similar discussions in favor of headers like Letter, Initialism, etc. I wasn't around for the initial discussions, but the topic resurfaces peiodically, so you can probably find lengthy discussions in the BP archives if you care to search for them. I think the last time the issue was raised, the proposal was to use Abbreviation (or Initialism, etc.) as the "POS header" and include (adverb) or whatever as context. Your proposal might be applicable, but the format would need tweaking. We don't normally put the entire definition line within parentheses. --EncycloPetey 02:44, 19 December 2007 (UTC)[reply]

Personally I wish ELE allowed "Verb Phrase" and "Noun Phrase" etc. Especially in cases when a "noun" is almost a dozen words long. It's not a big enough deal to raise swords over though. Language Lover 16:11, 20 December 2007 (UTC)[reply]

AutoWikiBrowser

How does one go to requests AWB access to Wiktionary? I'd like to use it to simplify mass template conversion (specifically the deprecated forks of {{fr-noun}}). Circeus 05:36, 21 December 2007 (UTC)[reply]

I don't think there are any procedural restrictions on using it in regular mode, here; User:BD2412 has used it efficiently for quite some time now. Running it in bot mode (and/or more than 6 edits per minute) would require a couple things: a WT:VOTE and figuring out exactly how w:WP:AWB limits its list. Presumably, that would be Wiktionary:AutoWikiBrowser/CheckPage, but only User:BD2412 knows for sure. --Connel MacKenzie 08:03, 21 December 2007 (UTC)[reply]

Well, right now I can't do anything on WT (I tried). I don't need (nor want, I don't even know how to make a bot) bot mode. If it's anything like WP, one has to add (or request addition of) their name to a list. Circeus 17:20, 21 December 2007 (UTC)[reply]

Yep, that's all it takes. That does not require a vote, so I've added you to the list. You should be able to use AWB now. Cheers! Thanks, Connel, for the compliment. bd2412 T 22:46, 21 December 2007 (UTC)[reply]

Important: Do NOT enable "Apply General Fixes" or "RegexTypoFix"! (and "Unicodify entire article" is probably a bad idea.) They will seriously mess with page format and content. (For example, "errors" in quotations that are in original text must NOT be "fixed". Etc) Robert Ullmann 10:33, 22 December 2007 (UTC)[reply]

I never use those anyway :p Circeus 18:02, 29 December 2007 (UTC)[reply]

Splitting grammar from definition

Currently, in non-lemma entries of foreign language words, the relationship of the inflected form to the lemma is given instead of, or as well as, a meaning in the definition line.

I think that the relationship of the inflected form should be moved to the inflection line, and the definition line should be retained as the meaning. This has a few advantages, mainly it reminds people to include a definition, so less clicks to get the required information (higher visitor satisfaction & lower server loads (negligible) ), also it makes extracting information easier, as the definition can always be found in the definition line.

For example, currently (and wrong in my opinion)

===Verb===

amat

third-person singular present tense of amo - he, she or it loves

Template:---- I would prefer something along the lines of

===Verb===

amat (third-person singular present tense of amo)

he, she or it loves

Template:----

Other questions that I am not sure about, does the translation need to be detailed, or can it just give the general meaning? (just "love" in the example above), also should forms of parts of words link to the intermediate stage, the lemma, or both. e.g. albiōris genitive singular of albior comparative of albus. What are anyone else's thoughts on this, I appreciate that it would take a very very long time to change what we currently have, and it wouldn't be high priority, however there are a lot of entries that are not yet created, so it would be nice to create them in a better way. Conrad.Irwin 22:11, 21 December 2007 (UTC)[reply]

Well, I don't think this is specifc to foreign language entries; inflected English entries also balance their link to the lemma entry. If we move the grammar to the inflection line (and the link to the lemma), then we're saying that the definitions below it are sufficient or self-standing. It seems tempting if there are few defintions for the lemma or if the term has basically a direct translations into English. But if there are several translations or several basic definitions, then keeping all redudancies up-to-date is difficult. And what may appear simple now will hopefully get richer as more editors work away. I support a short gloss (def or translation) but think the lemma should be in the defintion line (aside from defintions that are specific to that inflected form). --Bequw → ¢ • τ 22:58, 21 December 2007 (UTC)[reply]

This is a really touchy subject, and some people get particularly heated over it. For the moment, that's all I can really think to say. — [ ric ] opiaterein — 23:34, 21 December 2007 (UTC)[reply]

Some Japanese words totally defy a direct translation. Specifically te-forms (or "conjunctive" forms if you prefer) of verbs. To directly define one of these would take dozens of senses and lots of detailed usage notes. Heck, if you can solidly grasp the "translations" of a general te-form, you're halfway to knowing Japanese :P Language Lover 02:35, 22 December 2007 (UTC)[reply]

It would be sensible, in such cases, to just give the definition of the lemma instead of the exact translation, however that shouldn't be a problem as the grammatical information needed to get a more exact translation would be given above. A dictionary has to assume basic profiency on behalf of the reader. Conrad.Irwin 18:30, 22 December 2007 (UTC)[reply]

claudo is an example where there could be a problem. What entry do we have for "claudit", definition line third-person singular, present tense of claudo is simple (please forgive if my latin grammar is wrong). Saying: he, she or it shuts, closes, imprisons, confines etc - is less tidy. and would need synchronising with the lemma form. The wordings such as third-person singular should of course be linked. —Saltmarsh^Talk 06:48, 22 December 2007 (UTC)[reply]

That poses a bit of a problem for words that have multiple definitions, the nicest way of doing it is "see claudo", unless we had transcludable definitions - but that is unlikely to be feasible. Perhaps "#see claudo: to shut, close, imprison, confine." would work, though it is slightly tautologous. Conrad.Irwin 18:30, 22 December 2007 (UTC)[reply]

And it would be wrong. The form (deprecated template usage) claudo is first-person present active indicative, not an infinitive form, so you can't say that it means "to shut..." because only the infinitive has that meaning. And if you give th lemma meaning ("I shut, close, etc.") you confuse the reader who is looking at a page for the third-person form. --EncycloPetey 01:23, 25 December 2007 (UTC)[reply]

Maybe it would be fine in complicated cases not to have a definition line but only the inflection line, it seems wrong to me to include inflection related information in a meaning related part of the page. Another way to do it would be to write # see [[lemma#language]] as the definition, but I suppose that is only marginally nicer than the current way. Conrad.Irwin 18:30, 22 December 2007 (UTC)[reply]

EncycloPetey's brought up the Spanish verb tener before. How would we get all those definitions in forms, for every form, I don't know how to talk, but you get what I'm saying, I hope. The main argument was it would be too much to maintain — [ ric ] opiaterein — 19:00, 22 December 2007 (UTC)[reply]

Anyone who has ever taken a typical exam in a Spanish class, like most foreign-language classes, has had to do exercises like writing out conjugation tables with every form of tener in Spanish in all tenses and moods with their English translations. I refuse to believe we as a dictionary can't do the same, and with transcriptions, inflections, synonyms, and so on, as well, at some time in the future. Dmcdevit·t 02:24, 25 December 2007 (UTC)[reply]

This proposal is exactly what I personally tried for Bulgarian words (in Wiktionnaire). I created a number of pages, but I quickly realized that this technique was not reasonable at all. It is not uncommon for a word to have 5 or 10 senses and 40 inflected forms (for example). Would you write 400 definitions, and repeat usage notes in all pages? Listing all uses of an inflected form in the page (one per line) by providing grammatical information on each use is very useful: in French, it is usual for a verb form to be common to 5 uses, and looking for multiple occurrences of a form in a large conjugation table may be very difficult). Don't make it a theoretical issue: actually, I find that defining renards as Plural of renard is more precise a definition that defining it as foxes, which is a translation, but is sometimes misleading as a definition. Lmaltier 15:49, 23 December 2007 (UTC)[reply]

To expand on this line of reasoning, we can't put the grammatical information in the inflection line, because sometimes there is more than one to include. For example, the Latin adjective (deprecated template usage) albus has a form alba that must include the following grammatical information:

All that information will not fit into a single inflection line neatly. For someone wanting to translate a text, knowing the gender, nummber, and case can be more imprtant than having a meaning, since the grammar will determine which noun was being modified and what sense was intended. --EncycloPetey 20:19, 23 December 2007 (UTC)[reply]

For what it's worth, I've recently broken down and started using the above format, with example sentences to make the meaning clear. — [ ric ] opiaterein — 05:50, 24 December 2007 (UTC)[reply]

No Ric, don't give up doing it correctly because someone is whining that it is too hard. Just make a proper entry:

Adjective

alba, albā (declined form of albus)

white (without lustre), pale, fair, hoary, gray, bright
clear
fortunate, auspicious, fortunate

Declension

Template:ladecl1&2

Template:la-decl-1&2

at alba, go see. (Would be better if the #section references in the template were conditional, so alba would show black/bold; this word also has more translations/definitions than usual).

Note that having conjugation templates means we can show them in each entry. Readers don't have to trace from dumbed-down stubs to the "lemma" entry to find out the definition of the word. And now you have a perfect place to put examples (indicating the precise inflection for each you care to add; in some cases it might be useful to separate several inflections entirely.) You can add at will, not being forced by the format to create a dumb stub entry.

Yes, people will complain it is too hard. Creating a dictionary of all words in all languages is too hard; translating all the English words to at least 100+ other languages is too hard. If people want to create dumbed-down sub-minimal stubby entries, fine and good; that is better than nothing; but saying those are the desired form is utterly unacceptable. Robert Ullmann 09:29, 24 December 2007 (UTC)[reply]

Note that it still "points" to the "lemma" (which concept I'm starting to think is solely a product of dead-tree dictionary restrictions, grammars do start from the infinitive, but don't treat it as such a special case, it is often unusual in use ;-). And encourages adding more information, as opposed to the stub form, in which there is no obvious place to add anything. See WT:AJ and the Japanese entries where a similar situation obtains with script forms; the hiragana and rōmaji entries are simpler than the more complete entries, but still follow proper format; they don't pretend "define" the rōmaji as "romaji form of ...", but give the simple translation. (see for example, um, rōmaji ;-) Robert Ullmann 10:27, 24 December 2007 (UTC)[reply]

That makes a lot of sense in my opinion, however it does leave problems with redundancy and inconsistency. What would be ideal would be to have all of the definitions and inflections transcludable, so that information can be updated in tandem, this is doable but could be made easier using extensions or perhaps bots. Conrad.Irwin 23:45, 24 December 2007 (UTC)[reply]

(unindent) With all due respect and everything, Robert... the above example formatting for "alba" is horrible. It doesn't tell you directly what actual forms of albus are covered. You can't give every form of a word an exact definition by itself. I could in Romanian, which is why I formatted entries with the form-of information on the inflection line. But when I thought about adding form-of entries in Slovenian, it was different. Accusative forms have the same definition in English as nominatives. There's no indication without an example sentence of the difference. Trying to define every form as if it were a word that could function alone is grossly illogical in some languages. It might work in Romanian, but in Hungarian? Finnish? No. And with the format you gave, if you changed one entry, you'd have to change all of them. When changing 1 entry means you need to change 27 more, your system is fucked up. There's nothing wrong with the example that EncycloPetey gave earlier for alba, except that it could probably use some example sentences. — [ ric ] opiaterein — 00:33, 25 December 2007 (UTC)[reply]

Robert, no one is saying that it's simply too hard. We're saying that the maintenance required is ridiculous. We haven't been able to agree on how to handle synching pages like color / colour yet, and this is far more complicated to keep in synch. Why not first try your proposal out on English entries, and see how many people start to complain then. Begin with the forms of the verb (deprecated template usage) do. Keep in mind that it's not just the definitions and inflection tables that would have to be kept in synch with every edit, but the usage notes, synonyms, and all the other information you are proposing to add to every non-lemma page.

Could you explain how "the maintenance required is ridiculous" is not saying "it is too hard"?

Sync'ing pages is what wiki software does. We want to make the entries fully useful. And I'm not proposing that all the other stuff be added to every non-lemma page, (Although it should be allowed.) I am insisting that every page have the defintion(s). And we can figure out how to do it. Robert Ullmann 12:19, 25 December 2007 (UTC)[reply]

And since this is an English dictionary, the entry at does just cries out for a much better verb entry than "third person ..." of do, which doesn't help the person learning English (native or FL) at all. (Heaven help the non-native speaker trying to understand the preceding sentence by looking up doesn't ;-) Robert Ullmann 13:48, 25 December 2007 (UTC)[reply]

There's also much more than just the maintenance. You're proposed format makes it harder to determine which form of (deprecated template usage) albus are spelled (deprecated template usage) alba because you have to go hunting in a table. And please don't keep calling your POV "proper format", since it's at odds with community practice and consensus. --EncycloPetey 01:16, 25 December 2007 (UTC)[reply]

No it is not my POV. It is what ELE says is proper format. What you are saying is not community consensus, when Rod tried to add it to ELE the vote failed. Robert Ullmann 12:19, 25 December 2007 (UTC)[reply]

You are now arguing semantics. You know perfectly well that ELE only covers the case for lemma forms, and that it does not reflect community practice on this issue. That was part of Rod's reason for attempting a fix. Unfortunately, his proposal covered more than just adjusting ELE to reflect current practice. --EncycloPetey 23:17, 26 December 2007 (UTC)[reply]

The irony of the statement that it's harder because you have to search in a table is, of course, that under the common system, you have to go hunting on a different page to even find out what the word you looked up means. And while both have a special place, word meaning is much more central to our articles than grammatical notes. Which isn't to say we should or even have to obscure the grammar. That example simply needs a more descriptive inflection line, or we might even need to rethink how we present inflection lines in cases where there are a lot of inflections represented by one form. Rather than "alba, albā (declined form of albus)" we could do something like "X (feminine plural form of Y)," "X (third-person plural conditional form of Y)" and so on. Personally, I think we should search out technical solutions for the long-term (section transclusion? Conrad is really on to something.) but that we should absolutely not be discouraging giving the proper meanings of words on the actual articles devoted to those words. Say they get out of sync. This is exactly the issue that we have with translations being on both English and foreign-language articles. They do get out of sync, but it's better than not trying. It's fine to say we should shy away from giving top-priority to high-maintenance tasks like this, but it's not very useful to argue against the whole concept. Dmcdevit·t 02:24, 25 December 2007 (UTC)[reply]

Yes, of course it should be (specific form of), it is just this example has a lot. Picking on (say) saltáramos might be better. And see estu, which is an excellent entry. Robert Ullmann 12:19, 25 December 2007 (UTC)[reply]

You don't need a questionable definition if you have example sentences.

The only way to format form-of entries with full definitions and with form-of information in the inflection line is to have several POS headers. That gets pretty messy. Check out am#Romanian. That big-ass mess was my doing, by the way :D I'll fix it eventually. — [ ric ] opiaterein — 02:52, 25 December 2007 (UTC)[reply]

I don't see the issue. Besides the duplicate headers which I just removed, that page doesn't look like any more of a mess than what other common words with several definitions look like. Which it is. And it looks much more complete this way. Note that we already need multiple inflection lines when a word has different senses with different. If you are simply concerned with formatting, then we should decide that ideally we would like this amount of information and then move on and decide how best to format it. But the format at am is not even close to being a dealbreaker as is, for me. Dmcdevit·t 03:16, 25 December 2007 (UTC)[reply]

I was only really using 'am' as an example for the multiple header thing. For definitions, there's stuff like the quintessent tener, with its 7 definitions. In the indicative alone, there are 30 forms, minus the few that share forms. Still, it's 210 definitions. That's a lot of shit to be changing if you want to add or reword a definition on tener, and then go through to change every form of... nobody wants to do that. Nobody will, and the quality will suffer for it.

Where are you going to put the 210 example sentences with translations? On the lemma entry? How many thousands of lines are you going to make "lemma" entries when someone wants to add usage examples for those learning all of the inflections of basic verbs. Robert Ullmann 12:19, 25 December 2007 (UTC)[reply]

I would put the example sentences under usage notes and class that as "entry information" (see #Re: Everyone), There is a lot of information that can and should be copied across, our entry layout at the moment is not perfect, but it is fixable. Conrad.Irwin 00:47, 27 December 2007 (UTC)[reply]

Defining words

When someone looks up a word in a dictionary, they primarily want the definition.

See response to this below. Language Lover 19:28, 26 December 2007 (UTC)[reply]

(Now continuing Mr. Ullman's original message)

Now keep in mind that the majority of users are not fluent in English, and of those that are, 95% don't know what "accusative" case is anyway. If they look up "alba", what they get is:

alba

xxxxxxxxxx xxxxxxxx xxxxxxxx of albus.
xxxxxxxxxx xxxxxx xxxxxx of albus.
xxxxxxxxxx xxxxxx xxxxxx of albus.
xxxxxxxx xxxxxxxx xxxxxxxx of albus.
xxxxxxxx xxxxxx xxxxxx of albus.

Not a comprehensible English word or definition in sight.

Even I find this pretty much useless; I know all this stuff; but if I am reading Latin and run across an unfamiliar word (very frequent of course if I am trying to ;-), I want to look it up and find the definition. I don't give a flying f--k about the inflection. (Either I already know the pattern, or it really isn't going to help much!) All I need and want is:

Adjective

alba, albā (who cares what declension of albus)

white (without lustre), pale, fair, hoary, gray, bright
clear
fortunate, auspicious, fortunate

Sorry, Robert, but that's kind of a silly claim. Given just this information, half the time you wouldn't be able to tell what noun alba is modifying: Latin incorporates that information into the inflection (via agreement in gender, number, and — most importantly — case), so doesn't always bother to make it obvious via word order. You're claiming that a simple "foo form of bar" definition is "dumbed down", but in fact it's you who wants to dumb down our entries. If a reader hasn't studied any Latin, then you already have your approach: they ignore the grammar info, click one of the twenty gazillion bold links to the entry for albus, and get a list of definitions. (Your complaint is what, that the reader needs to click a link? Cry me a river.) If a reader has studied Latin, then they'll find grammatical information useful if they ever want to, say, actually use the word, or even understand a sentence that uses it. (This is a bit trickier for languages that are still alive, since we'd presumably have some readers that understand the concepts but don't know the terminological conventions; neither your approach nor mine is very helpful to such readers. Mine is more helpful, since it at least presents them with the terminology to look up, and gives them prominent links to lemma forms so they can learn what the lemma forms are and have a chance of being able to use other dictionaries rather than being entirely dependent on Wiktionary's coverage, but I'll grant that neither of our approaches is ideal for this group. The solution, of course, is to find an approach that is ideal, rather than to give up on such readers and saying "screw it" to the notion of them learning something.) —Ruakh_TALK 07:17, 30 December 2007 (UTC)[reply]

"Syncing" entries

This is what hypertext and transclusion are supposed to be all about. It is utterly astounding to find, in this context of all possible contexts, people complaining that it is too hard to generate pages with content shared with other pages. It is what the wiki does.

I have noticed, all though my s/w eng career, various customers (and/or management types) say "No, we don't want to do X". I say why not, wouldn't is be useful?" And they invariably say: "No, it is TOO HARD. We don't want to pay you to do that." And every single time is is something effing trivial to do. (In one notable case, it would have taken a bunch of extra code not to.) Just 'cause you might think something is hard, does not mean it might not be really easy. Once done. (This case isn't hard, and may indeed be trivial once fully understood.)

There have been various ways tried to do this here, none satisfactory; but also none really looking at the requirements and implementing something that meets them. This is why I keep going on about the infernal whining. We don't have to have dumbed-down entries just because we haven't found the right bit of tech to do what we want. We certainly do not want to standardize the dumbed-down format because doing it right looks too hard to some people.

There are at least two ways to do section-sync very nicely; one requires parser support (which, note, we can get if we want), the other is just a SMOC. And there are probably others.

One thing that would be very good is automation for a given language to sync various entries including the language wikt.

Another example is that I can sync the Mandarin Pinyin syllables (which, note, have definitions ;-) after each XML dump. Etc.

We can do it. I will not listen to "too hard" or "ridiculous amount of maintenance" or "lot of s--t to be changing" and settle for dumb stubs. See? Merry Christmas (Festivus, etc) Robert Ullmann 12:19, 25 December 2007 (UTC)[reply]

(note to Dmcdevit and Conrad: don't tell anyone else here; they don't want to make entries more useful and complete, they much prefer to think that a solution to syncing content is out of reach. They're right of course, it's just too hard. Shhhh!) Robert Ullmann 14:53, 25 December 2007 (UTC)[reply]

masturbam - Lovely. — [ ric ] opiaterein — 17:05, 25 December 2007 (UTC)[reply]

Certainly a shuffle in the right direction, at least the definition is there .Conrad.Irwin 02:15, 26 December 2007 (UTC)[reply]

Very well said Mr. Ullmann. I definitely think that this would be very very useful, we just need to discuss in what way precisely these are going to be used (I am assuming we can persuade someone to install the (very exciting) section extension for us). This would allow us to wrap the definitions and anything else that would be wanted (possibly the whole PoS section) and keep them in sync across all the non-lemma entries. From initial experiments on Wikisource, it seems that we can't hide the <section> tags in templates - so it would require more learning on behalf of our contributors - but not to any great extent.

If we were to transclude the whole of the PoS section, which after consideration seems like the best option, the only difference (in display) between "lemma" and "inflected form" would be in the inflection line. We would have to decide whether to go for the minimal "inflected form of" or to extend it, perhaps over several lines, to list the possible forms that it actually is. I prefer the idea of telling the reader directly, though it is also possible to just let them look it up in the transcluded inflection tables.

The main drawback (not problem) is that, in order to create inflection specific information, it would either have to be included (out of place) before or after the transclusion, or to be included in the "lemma" entry surrounded by a conditional that only displays it on the right page (easy to wrap in a template). The third and worst solution would be to {subst:} a stale copy of the "lemma" entry and then edit that separately. I think that the "surrounded by conditionals in the lemma" is by far the best solution, though it requires people to find the lemma entry to edit (trivial with a small amount of coding). I don't count the fact that the English translation of the foreign term would be that of the lemma and not the inflected form as a disadvantage, I think that saying (highly paraphrased) "amat he/she/it loves" looks tacky, "amat from amo I love" is much neater (and that is a trivial example - wait 'til we get to the gerundive, or participles).

Other things that we need to consider. Etymology is outside of the PoS, this means that it would have to be transcluded separately (I think for all the other trivia sections this doesn't matter as they are form specific). Unless of course we want to say that the etymology of a derived form is simply that it evolved from the lemma, and let those who are interested look it up further - but the whole reason of transcluding sections would be to avoid the "see lemma" plague. (Though as I don't generally look at Etymology, I wouldn't mind ;).

Finally, boringly, a proposal for a naming standard of sections (ISO)(Pos)[1-9]. <setion begin=laNoun /> <section begin=frVerb3 /> etc. Sections should begin after the inflection line, and end before the next Trivia, PoS or Language heading.

Is there any chance we can ask a developer to install the solution, even if it is initially just for playing around with (maybe we should get it for a couple of weeks on witkionarydev first) Conrad.Irwin 02:15, 26 December 2007 (UTC)[reply]

Though that is a useful extension, the transclusion approach still seems to me unsatisfactory. The problem is that all the definition lines on the "lemma" page are written in a different form than they should be for the definitions on one of the inflected form pages. So, if for example we transcluded the POS section of tener#Spanish on tengo, it would read "1) to have ... 2) to possess..." instead of "1) I have... 2) I possess..." Is there any way around that? --Bequw → ¢ • τ 04:52, 26 December 2007 (UTC)[reply]

The entry page would then read "tengo, a form of tener, 1) to have 2) to possess etc." While in the first person case there is not much difference if we go to tenga, your solution would then have to read "tenga, a form of tener, 1) I may have, he may have, she may have or it may have 2) I, he, she, or it may choose" etc. Notwithstanding the fact that the subjunctive can be translated in different ways depending on context. In inflected forms for which there exists no exact correspondence into English, which is very commmon (There were mentions of Japanese forms earlier in this conversation that defy translation), the only nice solution is to define the lemma and let the user decide how exactly to translate the inflected form, based on their knowledge of the language. We could even help them by providing some general English grammar aids, and possible foriegn language grammar aids, but these shouldn't be in dictionary entries. Conrad.Irwin 12:31, 26 December 2007 (UTC)[reply]

I remain unconvinced that 100% exact definition sharing is ever appropriate, in any language. (Ric's masturbam example, being one such example.) On the face of it, exact-sharing seems like a petty way to propagate limitations of paper dictionaries, while at the same time, limiting the usefulness of entries here. Synchronizing elements that should be synchronized is helpful; neutering entries because of laziness is not. An example given above suggested 400 repetitive definitions (not true of course - there is plenty of variety in how those are translated and defined) but instead of pursuing methods for automating (or semi-automating) such synchronization, the suggestion is to thwack 399 entries? Sorry, but there is no way that can ever fly. Even if you were to recommend that as an approach only for a particular language, it still would be unworkable, as it would not successfully deal with shared spellings in other languages. When all is said and done, an English reader needs to be able to look up a FL word, see the definition (in the appropriate English inflection) and what form it is, with the correct conjugation information. The intent all along has been to have bots help where appropriate, allowing for manual minor corrections on specific entries. Forced-syncing precludes corrections and can never be viewed as viable, even for languages that have very few exceptions to their conjugation rules (at present.) --Connel MacKenzie 08:51, 26 December 2007 (UTC)[reply]

I am not sure exactly what you mean by this, the proposal doesn't involve any thwacking as far as I am aware. The reason that we would need to use the Labeled Section Transclusion extension is so that we can extract the correct section from the lemma entry. This is a method for automating synchronisation, a very nice, already existing technology that does almost exactly what we need - in a more general way, rather than trying to hack our own en.wikt specific way of doing it, which would, almost certainly, lock us into our current conventions. Creating the non-lemma entries can still be done, if not more easily, by a bot - which would just have to create a {{#section:amo|laVerb}} to fill in the entire entry.

There are, that said two issues, firstly any translations that are given in the lemma form would be translations only of meaning, not of exact PoS matches (an advantage in my opinion, but I can see that other people may differ). Secondly example sentences (if they stay in their current position) would be limited to only one form, unless we have (easy to do) conditionals or switch statements around the example sentences, or move them to a new section (perhaps Usage Notes) and don't transclude them. To reiterate, these are merely issues, not big problems. Conrad.Irwin 12:31, 26 December 2007 (UTC)[reply]

Not all words in other languages have exact English equivalents

As one example for why we can't give translations for every form-of word we have, let's look at a couple of things, starting with Romance languages and the subjunctive mood. I'm not sure exactly how it works in other languages, but in Romanian, the subjunctive when used alone translates with "should".

Să mănânc = I should eat.

However, the subjunctive is used in other constructions far more frequently.

Vreau să mănânc = I want to eat.

But you can't say "I to eat" in English (without looking like a retard). If you were to put "I should eat" in the definition line of "mănânc", it would be (in a way) incorrect and misleading.

A number of languages have plural forms of words whose closest English equivalents are uncountable. How are we going to give translations for them? "More than one X" looks great, right? :)

The instrumental case of Slavic and other languages also poses a slight problem. "by x", "by use of x", "by using x", "using x". Which would we use?

There is and never will be an exact English equivalent for every word in every language. Trying to give definitions for every form-of word in very language is not a good idea. So as for formatting, there will always be form-of information, but there will not always be an exact definition. As such, the best place for that information is generally in the definition line, because there won't always be a definition to go there. — [ ric ] opiaterein — 18:26, 26 December 2007 (UTC)[reply]

See #Re: Everyone, I made a mistake when I proposed translations for different parts of speech. Conrad.Irwin 00:41, 27 December 2007 (UTC)[reply]

"When someone looks up a word in a dictionary, they primarily want the definition."

Way above, Mr. Ulmann made the claim that "when someone looks up a word in a dictionary, they primarily want the definition".

I don't think that's true in general. There are two kinds of people who use a two-way dictionary. There are translators (which includes people who are just doing it on their own for fun), and there are language learners. What Robert said is probably true of the former. But it is NOT true of the latter. To attempt direct translations, in most cases, HURTS the language learner. Especially a newbie language learner who isn't yet very skilled at learning languages, detecting idiomaticity, etc.

Furthermore, MONOLINGUAL dictionaries don't try to give definitions of inflections. They say things like "past tense of blah". This remark goes for electronic as well as paper monolingual dictionaries. It goes for the English entries in wiktionary! Language Lover 19:28, 26 December 2007 (UTC)[reply]

I see what you are saying, but I think there is a subtle difference between the terms "definition" and "direct translation" which you are treating as synonymous. The dictionary should give the meaning of the word (the definition "caress"), and not the direct translation of every form "he/she/it caresses". This is why transcluding the definitions from the lemma would work well (as above). (Just because other dictionaries do that, doesn't mean we have to copy them - indeed by copying the definition from the lemma, we wouldn't be copying groan) Conrad.Irwin 21:12, 26 December 2007 (UTC)[reply]

If you're saying what I think you're saying, then every form-of word would get the same definition as the lemma. Is that correct? If so, I shouldn't even have to say why that's such a bad idea. — [ ric ] opiaterein — 22:17, 26 December 2007 (UTC)[reply]

I honestly do not know why transcluding the contents of the lemma entry to its inflected forms can be considered a bad idea. Read the section on syncing entries above, it would make Wiktionary a much better resource, and a much more structured one. Conrad.Irwin 23:05, 26 December 2007 (UTC)[reply]

...it would mean that, instead of saying that foxes is the plural of fox, the entry would be:

Small carnivores (Vulpes vulpes), related to dogs and wolves, with red or silver fur and a bushy tail.
Any several of numerous species of small wild canids similar to the red fox, this term describing several members of at least five genera (see the Wikipedia article on the fox).
Fox terriers.
Cunning people.
(slang) Attractive women.

And the same would apply to all plurals, comparatives, past tenses, etc. accross the whole of Wiktionary. Frankly, I think that "plural of fox" is a much more useful (and certainly more consice) explanation in this case. --EncycloPetey 23:10, 26 December 2007 (UTC)[reply]

Because fuseserăm doesn't mean "to be", it means "we had been". — [ ric ] opiaterein — 23:10, 26 December 2007 (UTC) plus all of what Petey said above — [ ric ] opiaterein — 23:12, 26 December 2007 (UTC)[reply]

I refer this topic to #Re: Everyone. Translations that include grammatical information are beyond the scope of a dictionary. Once the grammar has been split from the meaning then by transcluding the meaning, we can concentrate on the grammar. This does not mean sacrificing the grammatical information, it means supplementing it. Conrad.Irwin 00:27, 27 December 2007 (UTC)[reply]

A problem with overlapping

Whatever strategy we decide on, it should be a strategy which would work for any other language's wiktionary. Thus it makes sense to look at English examples.

Consider run. It has 30 verb senses and one of them is "Past participle of run". You agree of course we should be consistent, however we do things. If you say we should put all the translations of past participles in a past participle entry, that means we should duplicate the 29 non-past-participle senses of run and add them to the entry in past-participle form. And if you're in the camp where the lemma is referenced in the inflection template, how would that be handled? We can't put it in the same inflection template as the original entries. Should we make a 2nd verb header for the past participle senses?

Again, in Japanese, nouns are always their own plurals. So should 猫 have two senses, one for "cat" and one for "cats"? You might say this is a special case since ALL nouns are their own plurals, but the same difficulty goes for German, where some nouns are their own plurals and some aren't.

This and the previous L3 section pretty well illustrate why it's so wise to use a lemma-reference for the definition, even if synchronization difficulties could be fixed.Language Lover 22:12, 26 December 2007 (UTC)[reply]

Yet another problem with the proposal. Japanese potential verbs: 食べられる means "to be able to eat". But since it's a potential, special grammar applies, it can't take a direct object. That doesn't mean it's intransitive... it CAN be used to say "can eat fish", just not with the direct object particle. If that's confusing, it illustrates why a direct translation would mislead our users. Now, if there were a transitive verb which directly meant "to be able to eat", and wasn't a potential conjugation of a lemma, then that WOULD be able to take a direct object. To distinguish the two we'd need to add usage notes and now things are just getting out of hand. Language Lover 22:33, 26 December 2007 (UTC)[reply]

I have moved on from proposing different translations for every inflected form, that was a mistake. See below for what I should have meant. Conrad.Irwin 00:20, 27 December 2007 (UTC)[reply]

Re: Everyone

I will try to clarify this here as it is central to this entire thread and it confused me for a long time.

"Present participle of run" is not a definition, it does not mention the meaning of a word; It should not be in the definitions section.
"Present participle of run" is grammatical information; it should be in the inflection line or, in some cases, an inflection section.
Direct translation of entries including part of speech "do not work" because it is mixing "meaning" and "grammar".
Translations of "meaning" are what a dictionary provides, translations of "grammar" can be done with a grammar aid.

In a similar way all of the information in a current 'entry' can be divided into two parts "Entry information" and "Word information".

"Entry information" is everything that is specific to one "inflected form of a word" or one "arrangement of letters"

e.g. Part of speech (inflection), Anagrams, Pronunciation, Rhymes

"Word information" is everything specific to one "word and all of its inflected forms"

e.g. Etymology, Meaning (definitions), Synonyms, Translations, Related terms

There are a few sections that don't divide very cleanly, 'External links' could be either, but is more likely to be "word information". It is also possible that one inflected form of a word picks up a meaning that the other forms don't share. There will always be exceptions for those who want to find them - in most cases little thought is needed to find a reasonable solution (which is still at least as good as the current situation).

Each Wiktionary entry can contain multiple words, each entry can have multiple forms of the same word. The "entry information" tells the reader which forms of which words are there. The "word information" tells the reader what the words mean, it should be transcluded from a central copy to avoid copying errors and needless duplication.

Any questions about that, or specific words which will "never work" with this scheme, I will happily address. However read and understand the bullet points first. Conrad.Irwin 00:20, 27 December 2007 (UTC)[reply]

I've replied in depth below, but will offer a simple challenge to demonstrate that the problems in your proposal are far from trivial. Set up a text page (with link) showing exactly how you would handle the entries for German and Germans. Use your proposed transclusion of the "word information", please. You will find that your distinction between "entry information" and "word information" above does not hold...and this is a fairly simple case! --EncycloPetey 07:04, 27 December 2007 (UTC)[reply]

German was a simple example, I created it the way I thought it should go, it worked exactly as I thought it should. Ullmann then edited it to make it work the way he thought it should, and again it worked correctly. Conrad.Irwin 19:43, 27 December 2007 (UTC)[reply]

OK, so now my question: Will the resulting formatted page be easier for new users to edit, or will it make it harder for new people to figure out how to edit the definitions? I note that only two of the three definitions have been trtanscluded, and that they are still organized under specific parts of speech, not in "definitions" sections. I also note that you have included no example sentences or quotations. I would like to see the full format as proposed, including Synonyms, Translations, and everything else you said was "word information" that should be transcluded. --EncycloPetey 20:07, 27 December 2007 (UTC)[reply]

Apologies, I hadn't realised that Ullmanns edits broke my version (now fixed). The reason only one PoS is included is that only one PoS exists in the form "Germans". I also see no reason to remove the headings, they work well at the moment. Usability is one of the biggest concerns. The differences will be that some of the information will have to be edited at the lemma, the [edit] section links will point to the correct places (or can trivially be made to do so). There will also be some <section> tags in the lemma entries, but I feel they can be ignored by those who don't understand them.

I'm having trouble seeing bullet points 1,2, and 4.

1: "Present participle of run" IS a definition. And it's the best definition we can reasonably give. Ignoring all the lesser senses of run and focusing on the physical activity, we could directly define "running" like this:
1. (n.) a state in which one is rapidly moving one's body by kicking the ground
2. (adj.) in such a state

but this definition takes special consideration to come up with. It would be WRONG to just copy the definition of "run", because "run" and "running" do not mean the same thing. The point is, it takes human effort to make a definition like the above, hence synchronization issues. Repeat for the other 28 senses of run.

In teaching myself Japanese, I've been weaning myself off bilingual dictionaries and starting to struggle (very painfully) with a monolingual Japanese dictionary. Trust me when I say this, I am MUCH happier with simple "Past participle of blah" type definitions there than I would be with detailed definitions from scratch. To a non-native-speaker just learning the language, the latter are incomprehensible.

2: While "present participle of run" IS grammatical information, that doesn't mutually exclude it from being the definition. If we put the run-connection in an inflection template, it'd no longer be an inflection template: it'd be a dis-inflection template.
4: I don't think most people would agree that dictionaries have to be devoid of grammar. Certain entries like "the" are necessarily grammatical and CAN'T really be "defined". Language Lover 03:18, 27 December 2007 (UTC)[reply]

As I said above, "Present participle of run" is not a definition, it gives the reader no indication of what the word actually means. It should be in the Wiktionary entry as it aids in understanding of the sentence, but the Wiktionary entry should also contain definitions. I have to confess I may have been loose with my technical terms, (feel free to point me to an introduction to linguistics that is better than wiktionary :). By both inflection and grammar I meant, "anything that shows how the word changes depending on implied context (inflection and dis-inflection if you prefer)". I agree that some words are inherently grammatical aids, and so do not have a meaning in the sense I implied.

Run and running both imply the same meaning, they are for use in different places. For example, in Latin, nouns decline, in English, nouns don't so all the noun forms in Latin, although they look different go to the same noun form in English. Yes they have different sense, but they have the same meaning. (I have to apologise that I am bad at explaining this, I will try a different tack) Words have different inflected forms to change the meaning of the "sentence" that they are in, not to change the meaning of the "word" itself. For example "He loves her", "She loved him", in both of those sentences, He and Him imply "a non specified male", She and her both imply "a non specified female", loved and loves both imply the strong feeling of attraction between them. The words themselves do not change meaning despite being inflected, but by displaying different forms they change the feeling of the sentence.

I agree with you about translations, it is not possible for us to give direct translations of inflected forms of words, they add a whole level of imprecision and hinder the learning of language, beyond the basics. (They encourage thinking of foreign parts of speech as directly equivalents of their closest English counterparts, which in many cases they are not, take Subjunctive in Spanish, Gerundive in Latin, etc. There may be some cases in which they do correspond directly, and the translation does not sound stilted to the native speaker, but that is unlikely. That is the whole reason that the definitions,translations synonyms and antonyms, etc. ,etc. can be directly transcluded from the lemma entry. See run/experiment and runs/experiment for initial, very crude, ideas. Conrad.Irwin 19:43, 27 December 2007 (UTC)[reply]

For what it's worth, it seems quite appropriate to show "dis-inflection" details in the headword/inflection/grammar line. Rod (A. Smith) 21:11, 27 December 2007 (UTC)[reply]

Conrad, the whole point of having a lemma form and non-lemma forms is precisely in order to keep the definitions and grammatical information as separate as it is feasible to do. All the definitions are placed on the lemma entry, and the non-lemmas contain only grammatical information, unless there is a definition unique to a particular form of the word. This is one reason why we have nothing called a "definitions section". There is no such section. There are sections for parts of speech, and on pages for lemma entries these are subdivided into an inflection line, definitions, and subsections for additional information related to that part of speech. For non-lemma entries, the part of speech section consists of simple grammatical information and a pointer to the lemma for the definitions.

Sorry for replying to sections individually, it allows me to collect my thoughts properly. The idea behind lemma and non-lemma forms is fine, but, we treat the grammatical information in the same way as we treat the meaning information, that is wrong, and is the source behind this confusion. Once we have successfully sorted the grammatical information from the meaning information, the meaning information can be easily added to each inflected form of a word, because they all have the same meaning and are just used in different contexts. At the moment the pages are divided by Language, then Etyomology, then Part of speech, then attributes (inflection line, definitions and dis-inflection, synonyms, translations, etc.). I agree that having these entries pointing towards the lemma is one solution, but it is far from the best. Instead of asking people to click on links, encouraging anyone to leave "form of" entries as soon as they find them, thus discouraging improvements; we can and should provide them the information that they would have picked up from the "lemma" entry, encouraging these entries to develop and making Wiktionary easier to use. Conrad.Irwin 19:43, 27 December 2007 (UTC)[reply]

It's not "wrong". The meaning of a word is grammatical information. Linguistics regard meaning (semantics, in the sense you are using it) to be included within grammar. They cannot be considered separately. That kind of reductionist thinking was abondoned. --EncycloPetey 20:11, 27 December 2007 (UTC)[reply]

It seems you're (quite literally) arguing semantics here. Although there isn't always a precise dividing line between grammar and meaning, it's quite useful to distinguish between the grammatical properties of a word (e.g. "plural nominative noun") and the semantics of the underlying lexeme. Our conventional non-lemma definitions (e.g. "plural of word") distinguish between those two aspects of the word by using grammar terms and merely mentioning the lemma term for semantics. Conrad's approach seems no more problematic than that conventional format. By the way, Conrad, the linguistics term (deprecated template usage) lexeme denotes what you're calling the "word " Rod (A. Smith) 21:11, 27 December 2007 (UTC)[reply]

Frankly, it is foolish to think that grammatical information is completely separate from the definition of a word. Most importantly, because every definition functions as a particular part of speech. If you truly believe that meaning and grammar can be divorced from each other for each of our entries, then you should read some serious linguistics discussions of the topic. Most linguists study words as the basic unit of meaning, and a word can exist in multiple forms. We put all the information pertaining to the word on the lemma page, and have the various non-lemma forms point to that lemma page. Duplicating the word information on multiple pages is needless repetition.

That is a fault with our definitions, if it exists. It is impossible (read: beyond our scope) to provide direct translations of each part of speech, because, even more so than words, different parts of speech do not translate exactly, or even at all in many cases between languages. For example, in the English entry run, the Latin "translation" currere is (correctly) given, however, run could, depending on context be translated as "curro curris currit cerrimus curritis currunt" and that is just the present active indicative, run is also used in the passive indicative, active subjunctive, etc.etc. This is correct, because the meaning is the same between the words, but it is not mentioned because it is not the kind of thing that should be in a dictionary. I doubt I am at all clear, and I would love it if you could point me to some introductory linguistics, Wiktionary is the only place I have ever discussed it.

No, it is not the fault of our definitions, it is an inherent feature of language. Linguists consider meaning to be one aspect of grammar. It is built into linguistic theory, not somehow a product of faulty definitions. It seems you are proposing all these changes without knowing basic linguistics, or speaking any languages other than English. --EncycloPetey 20:07, 27 December 2007 (UTC)[reply]

Perhaps I should have been clearer, our defintions are not faulty at the moment. If there was a definition that was not given in the infinitive, the (reasonably) neutral part of speech that seems to be in use in dictionaries, then it would be faulty. I see little reason not to define the infinitive (of verbs), as it is done in other places, though it might be better to remove the 'to's from the definitions, rendering it completely PoS neutral. Nouns are currently completely correct (in my view) at the moment, mainly (I suspect) because they don't inflect normally. Conrad.Irwin 22:06, 27 December 2007 (UTC)[reply]

But the lemma isn't always the infinitive. In Latin and Ancient Greek, the lemma is the first principal part (first-person singular present active indicative), because that is what all such dictionaires use. The reason not to define the infinitive is that it often doesn't function as a verb. More often, the infinitive in Latin (and Greek?) functions as if it were a noun. Further, there are active forms and passive forms, with a very real difference in meaning. Consider (in translation): "I was leading." versus "I was being led." In Latin, both of these sentences would be written as a single word, and all the difference in meaning comes from the inflectional form. You cannot include the same set of definitions, translations, etc., on both entries and hope to help anyone better understand the words by doing so. --EncycloPetey 22:31, 27 December 2007 (UTC)[reply]

Hmmm, I wasn't thinking hard enough about that. It seems that whichever form is chosen for the lemma, the English "definition" is written to correspond to the part of speech. That makes sense, in English, the chosen lemma is the infinitive. The problem arises when you transclude the definitions from foreign languages that don't use the infinitive as the lemma, for then the definition reads more oddly. Urgh. I still feel that even if the part of speech is consistently wrong, it does not matter greatly, the first person singular is also understandable as a general translation as it is, in most languages, the simplest (or first learnt) form, I will think about this more. Conrad.Irwin 23:19, 27 December 2007 (UTC)[reply]

Part of the reason we do this is that translation becomes impossible if you do not use lemmas. Consider that English has a single infinitive form for each verb, but that Latin has six (with a total of ten inflected infinitives). How then do you set up translations between 1 English infinitive on the one hand and 10 Latin infinitive forms on the other? Using our current system, one is pointed to the Latin lemma, which lists the various forms. Under your proposal, the user would have to investigate all 10 forms to determine which one is appropriate.

I know, that is why I proposed we transclude the translations from the lemma entry. There is no need to force people to click on a link when the definitions can be brought under their fingertips with minimal effort. The user would not have to look at any other forms, because all of the information would be at each form. Conrad.Irwin 19:43, 27 December 2007 (UTC)[reply]

But that would be positively misleading to transclude translations. The translations will not apply to the various forms. You are proposing to list the lemmas as translations of the non-lemmas. Even the lemmas don't have one-to-one correspondence. Consider the Latin adjective (deprecated template usage) albus, which means "white, clear, ...(etc.)". Its comparative form (deprecated template usage) albior does not mean "white", but means "whiter, more white; clearer, more clear" or it can also mean "rather white, rather clear, ...". You therefore cann't transclude the definitions, because the meanings change. Likewise, the Latin adverb ēlātē (“loftily, proudly”) has a comparative form which means "more loftily, more proudly". The definition changes when the word is inflected, because the inflection carries part of the meaning. --EncycloPetey 20:07, 27 December 2007 (UTC)[reply]

I'm unconvinced that transcluded translations are misleading. To readers who understand inflection for the source and target languages, it's obvious that the translation targets will need to be inflected. To those who don't understand inflection for the source and target, there's little harm, since they'd be unable to produce grammatical translations anyway. Rod (A. Smith) 21:11, 27 December 2007 (UTC)[reply]

The people who do understand the inflection and language will quickly realize that all the translations are "incorrect" for the pages on which they appear. The users who don't understand inflection will not be enlightened either. So what is gained? The current system makes it clear that the various inflected forms are just that. The proposal would make it much harder to tell what is going on. Any change to what we do should make it easier on the editors and/or users, not harder for them to use and understand the entries. This proposal is designed only to make it easier for the bots, at the expense of both the editors and users. --EncycloPetey 21:37, 27 December 2007 (UTC)[reply]

Well, there is a way adapt this approach to allow alternate inflections to appear. I'd show you an example in the Wiktionarydev site, but it seems to be down now. I like this approach because it seems to be a compromise between the approach you and I prefer (i.e. excluding translations from non-lemma entries) and the approach Connel et al. prefer (i.e. including non-lemma translations in non-lemma entries for level 0 students who want grammatically questionable translations without an extra click through to the lemma). Rod (A. Smith) 21:54, 27 December 2007 (UTC)[reply]

I suppose we could then say "Synonyms of `lemma`" "Translations of `lemma`" to make it completely unambiguous - though I don't think we need to. Or, which I would also be happy with, just transclude the definitions from the lemma, under a heading "definitions of `lemma`", and add "for further information, go to `lemma`". I think a dictionary entry that does not mention the meaning of the word, is near useless. Though, transcluding the inflection would also be possible, because it does not depend at all on part of speech. Incidentally, this approach was intended to make things easier for the reader, it does not have a negligible effect for most bots. Conrad.Irwin 11:27, 28 December 2007 (UTC)[reply]

No, because that makes it harder for the bots, and one of the proposed advantages of this approach is to make bot maintenance easier. If the header is different for every page, we lose that uniformity that currently makes it possible for us to easily identify and correct mistakes in section headers. And your proposal is now to include a "for further information, go to 'lemma'"? That's what we already do with the {{plural of}} or {{comparative of}} templates and such. Also, remember that the whole point of the lemma is to be the dictionary entry. The non-lemmas are not designed to exist independently of the lemma, but I fail to see how that makes them "useless". Consider: (1) We have many entries for common misspellings. These are not words and thus cannot and should not have definitions, but that does not mean they are useless. (2) The definitions are not the only way to communicate meaning. I would agree that non-lemmas should have example sentences and quotations, for example. These sections communicate meaning in a way that complememnts and sometimes exceeds what the definitions do. And I strongly disagree that the aproach makes things easier for the reader, on many levels, but that opinion should already be obvious. --EncycloPetey 19:42, 28 December 2007 (UTC)[reply]

I really can't understand what you mean by your points 3 and 4 above. There are words in Latin (and English for that matter) whose meaning changes depending on the form of the inflection. In English pant does not have all the same meanings that (deprecated template usage) pants has. The latter can refer to a piece of clothing, but the first cannot. In other words, the material you intend to transclude across pages will be different between many pages. Suppose also that we put a uniform meaning transcluded on all the various forms of Latin unus (“one”). How would that work when some of the forms are singular and others are plural? Yes, Latin has plural forms on the word for "one", as does Spanish. The meaning does not translate the same for the plural forms as for the singular. And this is just one example I could give of many. Consider that (deprecated template usage) pāx means "by permission" in the ablative only. It can mean "the goddess of peace", but only in the singular. The meanings are inextricably linked with the grammar, and they vary between inflected forms. Therefore, you are talking about transclusion of variable conent. And do not think that this is limited to a few odd words; every English noun that has countable and uncountable senses is affected by this, since the plural form will only apply to the countable definitions. --EncycloPetey 04:57, 27 December 2007 (UTC)[reply]

It is arguable that pants in that sense is a different "word" entirely as it has a different Etymology, ditto with pace, but bickering aside, I know what you mean. There are some words that are used in a specific sense only when they are used in a certain form. This is not a problem, it may be an issue, as it means the information might have to be added to the lemma entry (though it could just as easily be added before the transcluded definitions), either hidden with a template {{only|runs|#'''the runs''' diarrhea}}, or marked with a (plural only) tag marked next to it, the same applies to senses that exist only in the singular. It is entirely possible that someone looks up "run" when they see "the runs" because they think that dictionaries will only put in the singular form, however I agree that hiding the singular only senses is as useful for plurals as it is easy to do. Conrad.Irwin 19:43, 27 December 2007 (UTC)[reply]

Four pages that currently use this technique are run/experiment, runs/experiment, German and Germans. We shouldn't use it in many more places until there is more consensus for it, otherwise there will be a lot of tidying up to do. Conrad.Irwin 19:43, 27 December 2007 (UTC)[reply]

I'm am seeing already the biggest problem that will result from this aproach. The coding makes editing the page confusing to impossible for new editors. We already have FL editors who are confused by the TTBC sections; transcluding definitions will only worsen the problem. --EncycloPetey 20:07, 27 December 2007 (UTC)[reply]

I agree that this approach seems challenging for those who don't understand wiki syntax. Can that challenge be lessened? Rod (A. Smith) 21:11, 27 December 2007 (UTC)[reply]

The challenge is not much greater than that which we have at the moment, it only involves the addition of <section> tags to the . Most new users I have been watching/welcoming have started by adding translations or editing definitions, they do not jump in and create entire entries. The only problem with this method, at the moment, is that there is no [edit] link for the definitions. It would be easy to add one, by inserting an empty "====== ======" heading, or even by surrounding the inflection line in them, though I suppose another solution would be to put "======Definitions of `lemma`======" in to avoid all confusion. That could easily be done with a template Conrad.Irwin 11:15, 28 December 2007 (UTC)[reply]

Conrad, keep in mind a simple rule: The technical ability to do something doesn't mean it is a good idea. It just means it is possible. Whether it is desirable or not is exactly the same as before. And when your solution has a problem, patching on another layer always makes it worse. (No, it isn't "easily done with a template", then a section edit will edit the template.) As I have said, LST might be useful. But it can't transclude any headers, and it will create an issue for editors who don't know what that stuff is there for. We may be able to do something useful with a variant of it (note that it can transclude a section by header name, without tags; not directly useful, but an enhanced version might be), or with some other bot magic, etc. The issue is not the technical means; it is about getting rid of the undesirable lemma/non-lemma distinction lazily copied from the dead tree editions, and, as you point out, having definitions in entries, rather than "plural form of run" which in no possible stretch of logic or semantics can be construed to be a definition of (n) runs. Robert Ullmann 12:14, 28 December 2007 (UTC)[reply]

Robert, keep in mind a simple rule: The fact that traditional dictionaries take an approach does not make that approach lazy or wrong. The lemma/non-lemma distinction is how respectable dictionaries give information about two distinct grammatical entities: a lexeme (whose information is given in the lemma entry) and a specific form of a word. Do you know of a better (less lazy?) place to house lexeme data than in the lemma entry? Rod (A. Smith) 17:39, 28 December 2007 (UTC)[reply]

Rod wrote: For what it's worth, it seems quite appropriate to show "dis-inflection" details in the headword/inflection/grammar line.

Well I disagree. For one thing, even if there were no resulting problems, I don't like the idea. But there ARE resulting problems. First, inflection isn't always a reversible process. In Japanese, yobu (to call) and yomu (to read) have the same past tense (yonda). Of course kanji makes this unambiguous but we should eventually have romaji entries for them. And I doubt Japanese is the only language where different words share inflections. Second, in some languages you can inflect inflections. In Japanese, many conjugations of a verb turn the verb into a new verb which can be re-conjugated. For example yondeiru (to be reading) (which is also to be calling, btw). Such verbs would need a big "inflection" template to contain both their inflection AND dis-inflection info. In particular they'd need a separate inflection template than base verbs. Thirdly, the inflection template applies to all the definition lines below it. But some inflected forms have special meanings which aren't inherited from the lemma.

Conrad linked to run/experiment.

Maybe I'm missing it but nowhere in run/experiment can I find the fact that one sense of run is the past participle of run. It's listed as the past participle in the inflection template, but people won't be looking there to find the definition.

Conrad linked to Germans.

Put yourself in the headspace of someone newly learning English. Such people will "skim" the article trying to understand what they can. I'm speaking from experience having spent hours and hours reading monolingual dictionaries in other languages. All it takes is for them to skip the template because it looks like just another template. Suddenly they're utterly cornholed. The transclusion idea has some merit, I think, if we put the transcluded definition in a colored box and made it painfully obvious it was the entry for the lemma, not the word in question. This would augment, not replace, the original method of defining Germans. I still say it'd be needlessly confusing to ESL users.

Someone wrote (in paraphrase): An entry without a definition is useless and We force people to follow links

I think I disagree. With irregular verbs and such... It's quite likely an ESL reader will know what keep means but not what kept means. Most native English speakers don't know what holp means, do you advocate that we trasclude the definition of help into holp?? Again, put yourself in the headspace of an ESL speaker. You think the one saved click (in some cases) is worth the intimidation factor of making runs have almost 30 senses? Language Lover 18:31, 28 December 2007 (UTC)[reply]

If we had an entry for よんだ (yonda), our current convention is to create two separate POS sections, one for 呼んだ (yonda, “called”) and one for 読んだ (yonda, “read”). There wouldn't be any problem showing "dis-inflection" for each. Or am I missing something? Rod (A. Smith) 19:24, 28 December 2007 (UTC)[reply]

EncycloPetey said above that the purpose of the proposal is to make it easier on bots. However, that assertion is not true. The intent obviously is to make it easier on readers - transclusion is almost always harder (when not impossible) for bots to interpret (i.e. from an XML dump.)
IMHO, the form of entries should include subsets of the main entry's translation tables, but currently do not. They also are supposed to include the English glosses (but currently only rarely do.)
What I saw at the start of these various proposals, however, seems heretical. (Not what Conrad proposed in response, but rather the initial flurry.) For English entries, the main body of words covered by normal dictionaries is fairly complete, these days. Much of the specialized jargon and obsolete terminology is still missing (for English.) And almost all of our form of entries are currently incomplete. The original proposal seems to indicate a desire to propagate the concept of "incomplete entries" to other languages. That is heresy. Just because one type of entry is often incomplete, is no reason to make other entries useless. I think Conrad's response to that is understandable - in that no such course is sincerely viable. Propagating limitations of paper translation dictionaries is unwise, unwarranted and incomprehensible at this (or any) stage of Wiktionary's development.

As someone else pointed out, just because a paper dictionary does something, doesn't mean that's wrong. As they say in NLP, let's introduce choice, not replace one limitation with another. We can ALWAYS make things MORE detailed if we want. How's this for a new definition of verb sense 1 of run:

To move (that is, to propel oneself) forward (or backward or sideways in some cases) quickly (that is, in a manner full of speed, characterized by a short time lapse between start and destination, etc.) upon two feet (or more, in some cases, and in some exotic cases even on stilts, crutches, etc.) by alternately making a short jump off of either foot, that is, by alternately pushing that foot hard against the ground to propel the rest of the body forward, compare: walk.

See, wiktionary isn't a paper dictionary, so I think this is a better definition..... (sarcasm) Language Lover 21:27, 28 December 2007 (UTC)[reply]

Remember, this is the English Wiktionary for English readers. In theory, when we are "done" (i.e. reach basic equilibrium) all the Wiktionaries will have approximately the same entries, described in their own language. Translating the English entries to other languages will be meaningful when the en.wikt English form of entries themselves, make sense. Reflexively, all form of FL entries on en.wikt need to also be complete entries, to be comprehensible to the English reader. (That's our goal right here, remember?)
--Connel MacKenzie 19:36, 28 December 2007 (UTC)[reply]

Ok just to get some clarification on your stand. What, in your vision, would be a perfectly "complete" definition for holp (disregarding things like etymology, translations, pronunciation, citations, etc.) It seems to me that anything much more detailed than what is already there, would be rather unbalanced. Language Lover 21:17, 28 December 2007 (UTC)[reply]

Again, I would like to reiterate that 'past participle of run' is not a definition. If we were to follow that train of thought, then "run" should be `defined` also as "simple present of run" (And that is not specifying the exclusion of the third person singular). To define it as the past participle of itself also leads literalists like me to think, "past participle of 'past participle of `past particple of run`'", which does not make any sense.

Instead, the past participle of run (as it is already) should be in the inflection line, or dis-inflection line if that is preferred. If it is felt that it is not shown clearly enough at the moment, then the inflection line should be changed (allowing us to handle the very common case of multiple identical "forms" much more easily and better) perhaps into a section in its own right, or maybe just several lines, I don't see the need for a heading.

Once the inflection / dis-inflection has been properly sorted away from the meaning based definitions, this would leave us with differently (perhaps oddly) looking entries, and so it seemed logical to me to put the definitions back in. Initially I thought that it would be best to define each part of speech seperately, and certainly in English, that would be doable - though very time consuming. For foreign languages it is often not possible to give an English definition - we don't have equivalent parts of speech for many foreign parts of speech (Latin gerundive, Spanish Subjunctive and Japanese T-forms have already been mentioned), and many of the forms that we have similar forms for i.e. "third person", become he/she/it and other ugly things in definitions.

So, it seems that we are only going to be able to give the definitions of the lemma of the word, and, as they are already put in at the lemma of the word, it would be foolish to force a human to copy these, and having a bot continually trying to keep all the definitions in sync would not work very well (it would be a very busy bot [high server load], making lots of difficult decisions). A far superior solution is to have the definitions stored and edited in one place, and to automatically transclude them to the other places they are required. There are obviously small usability problems with this solution, the addition of <section> tags to lemma entries, and the addition of {{#section:lemma|definitions}} to the non-lemma entries. It would require no removals or other alterations beyond what was done to remove the dis-inflection/inflection from the definitions section, and that should be done anyway.

The third, and least popular step was the thought that, if we are transcluding the definitions, why not bring all the rest of the information along for the ride. For completely "form of" independent sections, like inflection and conjugation, this is easy; for others such as translations, which are identical in meaning but different in "part of speech" it seems that if this were to gain any popularity we would have to label it as coming from the lemma, which would not be difficult.

I hope that that explains my train of thought. It should be obvious to all of you where the weak links, and possibly (in your opinion :) wrong conclusions are. However I am fairly certain that once we have cracked the problem with the inflection/dis-inflection lines/sections we will need a way to solve the definitions problem. Everything beyond that was just added sugar. Including extra information in the form of definitions, by encouraging people to spend more time there, encourages more information to be added. At the moment, no-one will add anything to form of entries, because the second they get there, they click on the link to go away again. Seeding the entries thus would therefore make Wiktionary a better resource as well as a quicker to use one. Conrad.Irwin 21:08, 29 December 2007 (UTC)[reply]

Here is how holp might look if we stick exactly with the current entry format, and one possible way to make it clearer.

English

Verb

holp (archaic past participle of help)

(transitive) To provide assistance to (someone or something).
He helped his grandfather cook breakfast.
(transitive) To contribute in some way to.
The white paint on the walls helps make the room look brighter.

She was struggling with the groceries, so I offered to help.
(transitive) To avoid; to prevent; to refrain from; to restrain (oneself). Usually used in the negative with can.
We couldn’t help noticing that you were late.

We couldn’t help but notice that you were late.

She’s trying not to smile, but she can’t help herself.

English

Verb

holp

archaic past participle of help

(transitive) To provide assistance to (someone or something).
He helped his grandfather cook breakfast.
(transitive) To contribute in some way to.

...

And how do we know that all of those definitions apply to (deprecated template usage) holp? Since the form is archaic, it may have gone out of use before some of those senses came to exist in English, so you are forcing definitions on forms to which they may not apply. --EncycloPetey 21:20, 29 December 2007 (UTC)[reply]

Why is an example sentence for a different word for ("help") on the "holp" page? Isn't that precisely where we'd want to see one illuminating example of the archaic form? (With quotations for verification being in a =Quotations= section or Citations: page, only one example is usually needed there.) --Connel MacKenzie 22:10, 29 December 2007 (UTC)[reply]

The reason I did that entry in that way was to show how it would look were it to use transclusion, as you have said - being an archaic form it may well be that it is one of the outlying cases for which this will not work. Conrad.Irwin 18:56, 30 December 2007 (UTC)[reply]

But this isn't an outlying case. The example sentences are not helpful in understanding the particular form. Thus, transcluding the example sentences will hinder the additional of examples or quotations that demonstrate use of a sepcific form. This is critical in Latin, where the accusative or genitive has a very different function, grammar, and even shade of meaning from the nominative. If the examples from the lemma page are transcluded, then we no longer have the option to provide case-specific example sentences. --EncycloPetey 19:26, 30 December 2007 (UTC)[reply]

In which case, transclusion won't work (without severe corrections to our entry layout). Sorry for wasting so much of your time, it has been a fun discussion though. I would still like to see the (dis-)inflection moved away from the definitions, perhaps in a section replacing etymology, as it would be derived from the lexeme which is represented at the lemma. This would then leave a space so that people are encouraged to write anything, as opposed to the current situation where the "form of" entries look complete (with the only purpose as a soft redirect) Conrad.Irwin 12:24, 31 December 2007 (UTC)[reply]

{{colloquial}} and {{informal}} tags

A third installment of the discussion on this page here and here.

Since the devil is always in the details, I wish to have discussion regarding the application of these tags to specific entries to gain consensus of our community's view, and to clarify the issue for the future. There will be disagreement. Please voice it. It is my opinion that both tags should be preserved, while at least one editor proposed doing away with {{colloquial}} entirely. As the terms are nearly interchangeable, discussion should center around which aspects of the tag do the most service to the entry. The unique implications of each tag are so far the basis for keeping them. -- Thisis0 15:11, 22 December 2007 (UTC)[reply]

Thank you for restating this. I simply wish to state that I agree with your summary here; these are good examples that highlight the difficulty in strictly defining what is meant by the different tags. --Connel MacKenzie 19:31, 23 December 2007 (UTC)[reply]

Working definitions (revisable):

colloquial — categorizes words, written or spoken, that are standard and not slang, but likely arose via casual conversational English, and are likely to be used primarily in casual conversation rather than in more formal written works, speeches, and discourse.
informal — categorizes spoken or written words that are used primarily in a familiar context, where a clear, formal equivalent exists that is frequently employed in its place in formal contexts.

Specific entries (please add and comment)

belly button
I say {{informal}}. Formal is navel. Informal is clear in cases like this where the definition non-descriptively states the alternative. - Thisis0 15:11, 22 December 2007 (UTC)[reply]

boo-boo
I say {{colloquial}}. It's easy to see the conversational origin of this word (babytalk). Obviously, formal alternatives exist, but none is an exact substitute (injury, mistake, flesh wound?) - Thisis0 15:11, 22 December 2007 (UTC)[reply]

grandpa, et al.
{{informal}} is correct here. Though it's clear it came about by casual speech, these words have a clear formal equivalent, (grandfather, et al.), so the informal tag does more service here. - Thisis0 15:11, 22 December 2007 (UTC)[reply]

righto
Change to {{colloquial}}. Obviously conversational in origin, and the equivalent given is also an informal (colloquial) term, okay. - Thisis0 15:11, 22 December 2007 (UTC)[reply]

shorty
Should be {{colloquial}}. Casual conversation origin; no real formal equivalent functioning word for "short person". Missing definitions for shorty would also be {{colloquial}}. - Thisis0 15:11, 22 December 2007 (UTC)[reply]

throw down
I'm just sayin'... could you really see tagging this informal? C'mon... colloquial has a function. (I guess mostly 'cause we don't like the word slang.) - Thisis0 15:11, 22 December 2007 (UTC)[reply]

gonna, gotta, wanna
How could we get rid of {{colloquial}}? - Thisis0 15:11, 22 December 2007 (UTC)[reply]

squits
{{informal}} is correct here. No way could it be colloquial - Algrif 11:50, 27 December 2007 (UTC)[reply]
I agree that informal is the most effective tag here, referring to the direct relationship with the word's formal equivalent, diarrhea, but I don't comprehend your blanket assertion "No way it could be colloquial." How is it not obvious this word also came about through casual conversation? -- Thisis0 00:18, 30 December 2007 (UTC)[reply]

shakespeare

I need help in sourcing specific phrases from Shakespeare plays. Can I do this at this site? The phrase I want is:" like the toad, ugly and venomous wears yeta precious jewel in its head" Alive 2 alivezero@yahoo.com

See William Shakespeare's As You Like It at Wikisource, scene I. You should take up the rest of your research at http://en.wikisource.org. Also, http://www.google.com is a good friend to have. Happy hunting! -- Thisis0 00:41, 23 December 2007 (UTC)[reply]

Hey, don't forget Wikiquote! bd2412 T 11:36, 23 December 2007 (UTC)[reply]

About Lingala - needs some input

This is a request for people to just look at this page and correct any glaring errors. I have been trying to help Rp2 over IRC, but it is a case of the blind leading the blind. Even if you know nothing about Lingala it is entirely possible you can see something that should be done in a more Wiktionary way. The more mistakes are being made, the more things will need fixing, so please, all you gurus, speak now. Conrad.Irwin 01:05, 23 December 2007 (UTC)[reply]

A possible input: the Wiktionnaire (more than 2000 Lingala entries). Lmaltier 16:03, 23 December 2007 (UTC)[reply]

fr:Category:lingala gives the list. Conrad.Irwin 16:20, 23 December 2007 (UTC)[reply]

Shhh ... talk about wasting effort. I'm both delighted and discouraged now. I'll see what I can do with it. Rp2 00:27, 24 December 2007 (UTC)[reply]

Wiktionary style guide

WT:STYLE and Wiktionary:Style guide both redirect to User:Pathoschild/Wiktionary:Style guide - is /was this intended to be temporary? —Saltmarsh^Talk 05:24, 23 December 2007 (UTC)[reply]

No, but it is an aborted project, apparently. --EncycloPetey 05:26, 23 December 2007 (UTC)[reply]

New idea for etymology section

I just had an idea for another way to do the etymology section. If you're interested, take a look at 别来无恙. Do you think the style of the etymology section makes it easier or harder for the person looking at this entry? -- A-cai 13:37, 24 December 2007 (UTC)[reply]

The superscript makes reading the hanzi characters a little difficult. Took me a minute to figure out that 无恙 was listed first, but comes at the end of the phrase. — [ ric ] opiaterein — 17:59, 24 December 2007 (UTC)[reply]

How about like this:

Etymology

Literally:

	无恙	来		别
(you look like you are)	in good health	since	(we last)	parted	(company)

(original rearranged according to English word order)

Is this better? -- A-cai 23:51, 24 December 2007 (UTC)[reply]

That actually is pretty bitchin' :D — [ ric ] opiaterein — 23:58, 24 December 2007 (UTC)[reply]

The idea is good, but I really dislike the table - maybe it is just my browser (FF on Linux), but the table lines stand out more than the content. Conrad.Irwin 12:58, 25 December 2007 (UTC)[reply]

Why change the word order? Seems to me to be a lot further from the literal meaning (来 = "since"?)

(since) 别 other + 来 returning (coming) + 无恙 (you look in) good health

the table is overkill, much easier to read in-line. Robert Ullmann 13:31, 25 December 2007 (UTC)[reply]

The word order is switched because it's giving the literal (or closest literal) translation in English. You wouldn't say Company parted we last since in good health, which is more or less the way it's formed in the original phrase. You can't expect to put the words in the same order as English and expect it to work the same way. — [ ric ] opiaterein — 14:01, 25 December 2007 (UTC)[reply]

Sure you can, English is pretty flexible. And it doesn't have to be perfectly grammatical; as it is, it isn't even correct (there is no "parted company" strictly). If you want a grammatical sentence:

Since the other 别 time you came 来, you look to have been in good health 无恙.

But I think the previous version makes the literal meaning of the characters and the interpolated words clearer. Robert Ullmann 14:20, 25 December 2007 (UTC)[reply]

With all due respect (again) you're not fluent in Mandarin. I for one will be leaving literal meanings etc. to A-Cai, who obviously has a better idea of what's going on in Mandarin than we do. :) — [ ric ] opiaterein — 15:05, 25 December 2007 (UTC)[reply]

Robert, ric is correct. In this case imitating the Chinese word order would be more confusing than helpful. I think part of your argument is based on a misunderstanding of the individual words. So before going any further, let's clear that up.

别 - The most common meaning of this character is other. However, in this case it is a verb, meaning to part (as in to part company).

来 - The most common meaning of this character is to come. However, in this case it is a postposition, meaning since.

无恙 - literally means without a scratch (injury), but a more colloquial translation of in good health is appropriate here.

Hopefully, the above clarifies the function of each word. The problem is that in English, the word since is a preposition (before the verb), whereas in Chinese, it is a postposition (after the verb).

In conclusion, I would like to find a way to use English word order (when necessary), but still indicate which English words go with which Chinese words. The English words in parenthesis are implied, but not explicitly stated in the original. Leaving them out would make for a very terse and awkward English sentence, despite the fact that those words do not have a direct counterpart in the Chinese phrase.

Robert, you don't like the tables. Now that you have read my above explanation, do you have any ideas for a format (I was hoping for some kind of template maybe) that could accommodate the above situation. -- A-cai 23:37, 25 December 2007 (UTC)[reply]

IMO A-cai's new way of presenting English with superscript/Ruby glosses, although interesting and creative, I believe proves confusing in the end. Although it seems on the surface to make sense by putting the English translation in normal English order, with Chinese character glosses, this actually creates more confusion by artificially imposing another language's structure on the original phrase. It's probably best to present the etymology by giving each Chinese character in its original order, followed by the English translation (specifically, the most correct translation of that character as it is used in the phrase's particular context). The reader can easily reconstruct the grammar by rearranging the components of the phrase in his/her mind, while reading the English translations of each of the Chinese characters, in the original Chinese order of the phrase. 24.93.170.200 05:28, 26 December 2007 (UTC)[reply]

24.93.170.200, you seem to prefer the following format:

Etymology

别 (to part company) + 来 (since; from) + 无恙 (in good health)

While I agree that the above is just fine in many cases, there are times when it simply does not make sense to explain things in Chinese word order. Anyway, you already have the Chinese word order (the title of the entry itself). What I'm attempting to explain is how it maps to English word order.

Here is another idea:

Etymology

Literally: since¹ (we last) parted² (company), (you look like you are) in good health³

¹ 来 (since)
² 别 (to part company)
³ 无恙 (literally: without a scratch or blemish)

What do you think of this. -- A-cai 06:10, 26 December 2007 (UTC)[reply]

I think all of your ideas are interesting, and at first glance I like the look of this format, but rearranging the order of the Chinese characters (as this system would seem to do as well) in the "Etymology" section doesn't seem to be a good idea, as it's the Chinese phrase that is being described in the entry. In other words, it seems that the Chinese word order should be preserved in the entry's title, inflection line, trad./simplified boxes, and etymology, even if the phrase/sentence structure doesn't match the English translation's phrase/sentence structure. The reader will see the English sentence structure in the definition line, in any case. 24.93.170.200 07:14, 26 December 2007 (UTC)[reply]

Here is an example of a case where the literal meaning would be valuable in the etymology, but not in the definition line:

三个臭皮匠，胜过诸葛亮.

By putting the literal meaning of the above phrase on the definition line, we would be forcing a square peg into round hole. -- A-cai 08:11, 26 December 2007 (UTC)[reply]

I don't see him saying that the definition line should be doing that; rather that it is what it is: the "normal" English sentence structure in the definition line. We're just talking about the etymology. Robert Ullmann 09:15, 26 December 2007 (UTC)[reply]

I think Robert Ullmann's observation is most apt. That is, the idiom is described in the definition section, but the literal translation can be described in the etymology section (without rearranging the sequence.) This obviously doesn't apply for simple structures where the literal translation matches the idiomatic translation. By saying "(You look like you are) without a scratch or blemish since (we last) parted (company)" in the etymology, gives tremendous insight into the grammatic construct; that is almost more helpful than the resultant idiomatic translation. --Connel MacKenzie 08:05, 26 December 2007 (UTC)[reply]

A-cai, Ric is correct that I should defer to to you on the meanings being used; my point is that you don't have to—and should not—change it to "English word order" (in the Etymology). English is much more malleable than that; you can always put things in whatever order you need:

having parted company 别, since 来 then you look to be in good health 无恙

(or breaking the last two down into not and ill, but this is fine) English just doesn't have strong "word order" in the way Mandarin (apparently) does: Take the words "Rogers" (a baseball player), "second", and "base". Three words, so 6 possible word orders. Now listen to the play-by-play announcer, and you may hear:

Rogers is on second base
and second base is Rogers
at second, Rogers is on base
and the bases: Rogers second, Jackson third
the base runners: at second is Rogers
Rogers is on base at second

All 6, all perfectly grammatical. (;-) 'Tis better not to rearrange the words. Some additional format might be helpful, but just the plain in-line that we have been using everywhere seems fine. IMHO: we don't want tables or superscript tags or anything like that. Robert Ullmann 09:15, 26 December 2007 (UTC)[reply]

having parted company, since then you look to be in good health (this sounds very clunky to me, even though it is grammatical). -- A-cai 09:53, 26 December 2007 (UTC)[reply]

Connel and Robert, I understand your point about trying to make the literal definition match the word order of the original. I think you did a good job of getting the literal translation closer to this goal. However, even in your translation, the "since (we last) parted (company)" is actually the reverse of the original. In order to be completely faithful to the original word order, you would need to say:

(we last) parted (company) since, (You look like you are) without a scratch or blemish

In most cases, English and Chinese word order can be creatively reconciled so that it is not a major issue. However, there are some cases where I would like to give a sense of the sentence structure of the original as well (for cases where it is not obvious).

Here is another example:

英雄所見略同

英雄 - heroes

所 - (that/upon) which: when 所 is followed by a verb, 所 warns the reader that the verb will be followed by an object. The object need not be explicitly included in the sentence. An example: 他所写的信 (the letter which he writes)

見 - to view (an issue in a certain way); to have an opinion about

略 - roughly

同 - alike

If I put my English sentence in Chinese word order, I get an incorrect or misleading translation:

(incorrect) Heroes which hold views are roughly alike.

Let's try that in English word order:

(correct) (Opinions) upon which heroes hold views are roughly alike.

It's not the heroes, but the opinions of the heroes that are being described as alike. Sometimes, there just isn't a good way to reconcile English and Chinese word order, while still accurately conveying the overall meaning. -- A-cai 09:47, 26 December 2007 (UTC)[reply]

No, you are missing the point. As a fluent native speaker, I can tell you there is no "sometimes isn't". You can always write an English sentence in the order you want. (And the example above may sound clunky to you, but it isn't. ;-) This case is almost trivial, you just need to write it properly, in this case with reduplication of the object:

Heroes 英雄 which 所 hold views 見, hold views that are roughly 略 alike 同.

Here is where you prove my point about word order being misleading. In the original, which is intended to modify views, not heroes. The correct English would be: views which are held by heroes. -- A-cai 12:15, 26 December 2007 (UTC)[reply]

Which is better and clearer English than your "correct" example. And not in any way clunky; you might write it this way in English without any reference to Mandarin. OTOH, you don't need to make a grammatical sentence either; it is better to keep the order. Since we are talking about this, I might suggest it would help if you don't apply the Chinese concept of "word order" to English, since it doesn't apply. (Note that I used "since" both pre- and post- position in that sentence.) Don't worry if the English is grammatical or in the "right" word order; there isn't any "right" word order ... Robert Ullmann 10:59, 26 December 2007 (UTC)[reply]

That's a very radical approach to English grammar. Think that's anyway what I. :-) -- Visviva 11:05, 26 December 2007 (UTC)[reply]

Personally, I agree with Opiaterein's assessment above that the table version is "pretty bitchin'." It presents the information in a much more accessible way than we have been doing heretofore. (However, I think it would probably be somewhat easier on the eyes if it used the wikitable class.)
More generally, when it comes to complex and poorly-explored areas like describing the etymology of words and phrases in highly foreign languages, I think we should encourage a diversity of approaches. This is not an area where absolute conformity among entries is necessary, or even desirable in the short term. -- Visviva 11:05, 26 December 2007 (UTC)[reply]

Thanks, Visviva. I was beginning to think that I was losing my mind. When doing an English translation (even for a literal etymology section), I happen to think that it is important to write in a natural English style. The readers of Wiktionary are English speakers, and they are generally going to expect English grammar and English sentence structure in an English explanation. Yes, English word order can be manipulated to a certain extent, but let's not jump off the deep end. At a certain point, you start to sacrifice comprehension for sake of maintaining the structure of the original. What I'm trying to do is find a way to let you have your cake and eat it too. In other words, three things should be conveyed for the above phrases:

the meanings of each individual component in the phrase (etymology)
the literal meaning of the entire phrase in idiomatic English (etymology)
the actual meaning of the entire phrase in idiomatic English (definition)

Most of the time, Chinese grammar allows for 1 and 2 to be combined. However, I still think that there are cases where it makes more sense to include all three. -- A-cai 12:07, 26 December 2007 (UTC)[reply]

The "good English translation, in good English word order" will appear in English as the definition line (without Chinese characters). This is why I stated that it is better to keep the etymology (with English definitions of each character or multi-character Chinese word) in the original Chinese word order. One can rearrange the grammatical blocks in one's mind to make it "good English word order" because the "good English word order" will already appear as the English translation of the Chinese phrase in the definition line. Thus, the actual meaning (number 2 above) will appear on the definition line. Regarding the literal meaning of the entire phrase in idiomatic English, I think that can be construed by comparing the English translation with the etymology (which can include this); in fact, the etymology section could be as detailed as we want for any character, word, or phrase. 24.93.170.200 19:14, 26 December 2007 (UTC)[reply]

Using a confusing word order in the Etymology section so as not to "upset" the original phrase is silly. The point of it is to show you what it literally means in English, not in Chinese grammar using English words. The "good English word order" will only be used in the definition line if the English equivalent is the same as the literal meaning given in the etymology. The literal meaning is a translation. The Chinese characters are given (out of order) so you can see what corresponds to what. Just because putting them in one order in Chinese is better doesn't mean that the same word order will be ideal in English. On another note, it's very sad that this discussion has been turned into a debate about word order. Very sad, indeed. — [ ric ] opiaterein — 19:24, 26 December 2007 (UTC)[reply]

Thanks, ric. I couldn't have said it any better. I find everyone's feedback useful, even feedback which I consider to be uninformed (not pointing any fingers). I would like for Wiktionary entries to accommodate people's needs at every language level. That means we have to find creative ways to overcome challenges not faced by contributors to most other dictionaries.

Here is another one for consideration: 曾經滄海難為水. The current etymology section leaves out fine points about classical Chinese grammar. This phrase would not be learned until one is fairly advanced in the language. For such a person, my explanation is beyond sufficient (such a person should already be familiar with classical grammar). For 24.93.170.200, he may be confused about the syntax of 難為水. Someone like him is likely to look up each character 難, 為 and 水. The problem is that the current state of these pages doesn't really provide much help at this point. I'm open to suggestions, which is why I'm discussing it here. However, this discussion is beginning to indicate that we can't please everybody 100% of the time. -- A-cai 23:19, 26 December 2007 (UTC)[reply]

Of the three things you are trying to convey (building off your example above,) (2) will usually be obvious from (1) - when it isn't, it should be described in the definition section, not in the etymology section. In your latest example, it doesn't make much sense to omit (1) at the expense of (2) - particularly when the individual characters are not redlinked, for some reason. Your's and ric's opinion that the original Chinese should be obfuscated/translated in the etymology doesn't make much sense to me...conveying the English meaning is what the definition section is for. A definition line of # {{idiom|lang=cmn}} [[been there, done that]] ''(literally: having crossed the vast oceans, one can no longer take a river seriously.)'' seems to convey (2) & (3) a little more clearly. On a completely different note, pretty much every formatting thing that has tried to use tables/boxes has met significant (justified) resistance. Adding pretty tables in the etymology section will obviously be harder to parse (instead: ignored) by most (if not all) wikitext uses. Additionally, that needlessly raises the barrier to new contributors, where plain text works just as well (if not better, for example, on some of my preferred browsers.) Collisions with Wikipedia boxes and images are still other reasons to avoid gratuitous table-formatting. --Connel MacKenzie 01:47, 27 December 2007 (UTC)[reply]

In case it was missed, I did say, above, "in fact, the etymology section could be as detailed as we want for any character, word, or phrase."--and I meant it. If it needs to be explained, in another sentence in the "Etymology" section, that 水 in this context means "river" and not "water," this should be added. (In fact, this additional definition for 水--as in the term 山水, which is used in China, Korea, Japan, and Vietnam--should be in the 水 article as well, but isn't yet). I do believe that longer phrases call for a longer and more descriptive etymology section, going into more detail than is usual for, say, a two-character word. I think we all agree on this. 24.93.170.200 03:24, 27 December 2007 (UTC)[reply]

To summarize:

the use of tables received mixed reviews.
I'm not completely sold on tables either. They are much more unwieldy than straight text.
When necessary, the entry should provide both the literal translation and the actual meaning.
Some think that the literal meaning should be in the definition section, not the etymology section. Personally, I really don't think this is a huge issue in the scheme of things. One problem with the definition section is that it is less free form than the etymology, which makes it harder for me to provide context to the literal translation (if needed).
English "word order" should/should not reflect the original.
translation between two vastly different languages is more of an art than science. You can play around with English grammar and syntax to a certain extent, but it can be counter-productive, if taken to an extreme.

If I missed anything, please let me know. I need to think about all of this. Many of the phrases that I have introduced to Wiktionary are absent from other Chinese-English dictionaries. There is not a lot of precedent for what we are attempting. -- A-cai 04:44, 27 December 2007 (UTC)[reply]

Etymology cont.

I have modified 别来无恙 so that it no longer uses a table. I think this footnote approach would be helpful for some phrases, especially things like: 和尚打伞，无法无天. -- A-cai 05:03, 27 December 2007 (UTC)[reply]

MonoBot

Hi, I'd like to get approval to run MonoBot, an interwiki bot. The bot currently runs interwiki.py on da, de, el, es, fi, fr, ga, gd, hu, io, it, ku, la, pl, pt, ru, simple, sv, te, tr, vi, & zh wiktionaries. It's flagged (for a different task) on the Spanish Wikipedia and it also runs on the Spanish Wikinews. Thanks, Monobi 17:08, 25 December 2007 (UTC)[reply]

Running the broken version from the pywikipediabot framework? No thank you. --Connel MacKenzie 07:53, 26 December 2007 (UTC)[reply]

What would this do that Interwicket doesn't? Conrad.Irwin 11:48, 26 December 2007 (UTC)[reply]

Accents on Neapolitan entries

I noticed that the Neapolitan word "vàso" is listed at vaso instead of being located at the accented page name. I looked at Category:Neapolitan nouns and it looks like there is a mix of pages at accented and non-accented titles (cf. cetro and quistùra). Does anyone know what convention is being used for this language? I don't know who works on Neapolitan and I didn't see Wiktionary:About Neapolitan, so I thought I'd ask here before splitting "vàso" off into its own page. Mike Dillon 22:06, 25 December 2007 (UTC)[reply]

The only languages I've seen that done for are Latin and Old English. I don't know much about Neapolitan, but all other Romance languages get accents and diacritics in their page names, so I'd say go for it. — [ ric ] opiaterein — 22:14, 25 December 2007 (UTC)[reply]

One slightly funky thing is that cetro has an interwiki to it:cetro (which doesn't have the accent in the pagename). Mike Dillon 23:01, 25 December 2007 (UTC)[reply]

I suppose it's possible that Neapolitan uses accents only to mark the stressed syllable. It might not even have an official writing system. I'm really not sure. lol... One thing I've noticed is that some other Wiktionaries (like (especially) the Hungarian one) are kinda subpar, lazy, etc. It could just be that the person who started the article didn't feel like adding the accented e in the page name. If it were up to me, I'd include the marks in Neapolitan entry titles. (But I'd check out the Wikipedia article on the language first.) — [ ric ] opiaterein — 23:18, 25 December 2007 (UTC)[reply]

Unlike the other Romance languages, the Italic languages have a tendency to only accent a final stressed syllable, as in città. Otherwise, the accent is usually not written (e.g., piccolo). However, in dictionaries, they do put the accents, in the same way that English dictionaries use diacritics such as the macron and diaeresis. The problem is that many nonnative speakers get the impression that the dictionary accents are part of the standard orthography, and they aren’t. Italian does on occasion use other accents such as the acute accent (e.g., perché), and Neapolitan makes more use than Italian does. We should be careful to use only standard orthography (or as standard as unregulated Neapolitan gets), and not the "dictionary" orthography that only native Italians know when to apply or ignore. You might look at some Neapolitan pages such as w:nap:Lengua napulitana to get the idea. —Stephen 06:02, 26 December 2007 (UTC)[reply]

Requirements for adminship

Voting on Keene's adminship, people attacked him for allegedly not clocking in enough time. If that's important enough people are using it as a metric for adminship, let's codify it. Lest it be an "excuse" against people for Spanish-soap-opera reasons.

If not important enough, let's acknowledge that. So, it can remain a "reason" for votes (policy can't dictate how or why people vote). But one such voter shouldn't necessarily attract followers. A person shouldn't say "We can't admin her, she lacks enough experience". Unless there's actually such policy. Language Lover 18:04, 26 December 2007 (UTC)[reply]

Are you a Barack Obama supporter? :) — [ ric ] opiaterein — 18:07, 26 December 2007 (UTC)[reply]

So you are advocating that people who lack experience should be admins? Or that evidence should not be presented? Or what, exactly? Your reasoning strikes me as tendentious. There is no need to codify every potential action with policy. --EncycloPetey 22:54, 26 December 2007 (UTC)[reply]

I'm advocating that if there's a de-facto standard, let's make an official standard. Apparently people were looking at Keene's contribs and somehow making conclusions, can those who did this, tell us exactly how that works? Language Lover 18:13, 27 December 2007 (UTC)[reply]

So, you want us to explain how we look at data and draw conclusions? --EncycloPetey 18:14, 27 December 2007 (UTC)[reply]

Well, now that you put it that way... sure! That would be very awesome. Definitely in line with the whole philosophy of transparency and openness of Wikipedia. :) Language Lover 17:50, 28 December 2007 (UTC)[reply]

My understanding is that there are only two requirements for adminship. 1) You must agree to do it. 2) You get a good majority of "pro" votes. How people decide to vote is up to them. SemperBlotto 22:50, 27 December 2007 (UTC)[reply]

Templates in the language drop down

Moved to Wiktionary:Grease pit#Templates_in_the_language_drop_down, please reply there.

When you edit a page, the language drop down includes templates for English only- the other languages just have special characters. Would it be possible to include language specific templates in there too? For example, Spanish would have {{es-noun-m}}, {{es-adj}}, {{es-conj-ar}}, etc.? Nadando 19:06, 27 December 2007 (UTC)[reply]

That is possible to do, but MediaWiki:Edittools' size is a significant factor. --Connel MacKenzie 19:16, 27 December 2007 (UTC)[reply]

I recently asked for them on the Edittools talk page and someone immediately requested French ones as well. It would be nice if the basic templates for all active en.wikt languages were there. It would probably help newbies make more standard articles. Is there anyway to keep the size in check and still do this? --Bequw → ¢ • τ 22:21, 27 December 2007 (UTC)[reply]

No. There are just way too many templates in too many languages. Is there a way to customize the Edittools? Say, to have an optional section that pulls up a user-generated list of items from a specified subpage dependant on the editor's username? --EncycloPetey 23:14, 27 December 2007 (UTC)[reply]

That is a great idea. We could even add languages to WT:PREFs, to make it easy for people to check the ones they need. (I really dislike the enormity of edittools). Conrad.Irwin 23:23, 27 December 2007 (UTC)[reply]

Erm, what? Where would you add them then and how would &uselang= toggle it? You'd only be making it still larger and hiding random subsets? --Connel MacKenzie 23:35, 27 December 2007 (UTC)[reply]

Not if it's set up to generate a personalized list and only call that personalized list. I imagine a version of Edittools that only loads the items a user has checked off as wanting to use, plus a personalized section listing things not normally found. I, for example, never need the Arabic, Hebrew, and Welsh sections (for example), so having it always load every time is wasteful. But there are editors who find those sections very useful. On the other hand, I'd like to be able to plug in the Latin inflection, declension, and conjugatioin templates so that I don't always have to go look them up for unusual forms. Very few other editors would find these templates useful, but if I could have a customized version, then no one else would be bothered. Now, I'm not fluent in how the Tools are coded and loaded, so I couldn't say whether any of this is truly feasible, but it seems that there ought to be a way to have (1) A customizing tool, drawing from standard list sets as currently in the Tools, (2) A standard way and place to save the customized content in the editors user namespace, and (3) Set the Edittools to call the customized content if it exists, or else fall back on the default content (which could be set much smaller). --EncycloPetey 01:58, 28 December 2007 (UTC)[reply]

Hmmm. Separate http ajax requests for each Edittools section? Might be a bit too much, both for broadband connections and for the poor WMF servers. (Ajax requests generally are not cached at all - so they'd reload for every page load.) Even 13 or so supplemental Edittools pages (by language family) would be enormously taxing. (You can see part of that problem, if you turn on the WT:PREF for "Allow special characters to be input into the search box" - which does use ajax that way on the single Edittools page.) Move this (soft-linked) to WT:GP? --Connel MacKenzie 20:02, 28 December 2007 (UTC)[reply]

Not exactly what I was thinking. Not a separate request for each section, but the option to create a single file per user (with internal sections) that is then called when that user begins to edit. The user would have one section of the file reserved for personally desired items not in the master template from which all such user files are originally generated. That is, there would be a huge master file (never called), which would have an associated tool from which an editor could "shop" for those sections he or she would find most useful. The tool then saves only those sections to the user's personal Edittools, and this personalized version is what gets loaded. This would only apply to registered users who have an account. Is something like that feasible? --EncycloPetey 20:29, 28 December 2007 (UTC)[reply]

Hmmm. Stored as text within a user Javascript page (kindof like the patrolling JS) might cache better. If/when wiktionarydev comes back, it might be worth an experiment or two. :-) --Connel MacKenzie 21:08, 28 December 2007 (UTC)[reply]

I actually think it'd a really good idea, if we weren't trying to throw in every single template for whatever particular language it is. It might help with people using the wrong templates, or not using them or whatever. — [ ric ] opiaterein — 00:52, 28 December 2007 (UTC)[reply]

That would be a great idea, not just for language-specific templates, but also for various (generally) rarely used scripts that are not likely ever to make it to Edittools (e.g. Early Cyrillic, Glagolitic, cuneiform syllabograms, Lycian, Lydian..). At present, average contributor finds 90% of MediaWiki:Edittools content completely useless. BTW, is there a way to disable loading Edittools when editing pages? It sucks up bandwidth ^_^ --Ivan Štambuk 21:01, 28 December 2007 (UTC)[reply]

Can we please move this to WT:GP? --Connel MacKenzie 21:08, 28 December 2007 (UTC)[reply]

Moved to Wiktionary:Grease pit#Templates_in_the_language_drop_down, please reply there.

Help for editors

Sorry if this was discussed before, but could editors be helped with reminders (pop-up messages) about syntax when the entry is previewed or saved? The code could check for standard header names, correct levels of each header, correct order of the sections, existence of translation table if someone enters a translation, existence of gloss in trans-top, etc. Panda10 23:46, 27 December 2007 (UTC)[reply]

Different approaches to accomplishing that (a veritable holy grail) are discussed on WT:GP. Recently too many more trivial issues have made some of those discussions drop off (temporarily, I think.) A nice proof-of-concept would be to disable the [save] button (and warn why) for NS:0 entries when there is no language specified. --Connel MacKenzie 19:53, 28 December 2007 (UTC)[reply]

Could you point me to the title of the discussion on WT:GP? I was not able to locate it. Panda10 21:46, 28 December 2007 (UTC)[reply]

Oh, sorry. The "Wizards" discussion was here in the Beer Parlour (archived?) but was supposed to be in WT:GP. The concept has been tossed around quite a bit on irc://irc.freenode.net/wiktionary as well. --Connel MacKenzie 22:00, 28 December 2007 (UTC)[reply]

Request

Is this the right page to do a request like this? Okay my request is can you do a Concordances or Frequency list for song lyrics? I'm thinking something like Frank Zappa or something — This unsigned comment was added by Hailey C. Shannon (talk • contribs).

We do have some concordances, for Shakepeare, the Sherlock Holmes stories, some H. G. Wells novels, and A Clockwork Orange, that I know of. These are all linked word-lists. Most frequency lists we have are random surveys either from on-line book collections or a random survey of collected web postings. We even have a Concordance: namespace set aside for concordances. --EncycloPetey 19:52, 28 December 2007 (UTC)[reply]

Blocking policy discussion.

See http://lists.wikimedia.org/pipermail/foundation-l/2007-December/036695.html -- Jeandré, 2007-12-29t15:58z

It is very sad to see other projects engaging in this type of petty sniping. I remain confident that WMF (and foundation-l) can see through such obvious untruths, easily. Mis-linking to an almost irrelevant side-discussion, flat-out lying about block durations, completely ignoring that the individual blocks in question are easily justified, completely mis-characterizing discussions - I have to wonder what point is being attempted. Does en.wikinews now believe that all projects should be administered exactly the same way, with exactly identical policies? Or is there some specific grudge being danced around?

Assuming that it is simply a love for {{test}} et al., that has distorted the outside view, I'll try to describe how I think the Wikinews misconception evolved. Keep in mind, Wikipedia (with 1,000+ sysops) is the first stop for someone learning about the WMF. For disruptive editors there, they often are told their "trivia" additions are not welcome, or that their short descriptions are better suited for a dictionary (which Wikipedia is not.) The natural path for "unwanted" editors of Wikipedia is usually (disproportionately) Wiktionary. While it might be prudent to point out that such people have usually already been warned and/or blocked on Wikipedia, it is more important to point out that they are already pissed off when they arrive here. Thankfully, the good contributors that find their way to Wiktionary, still outnumber the bad contributors. Compare that scenario, to Wikinews. From the outset, one often doesn't realize that Wikinews even exists. Those contributors that do find it, tend to have news-reporting backgrounds (beyond a single high-school journalism class.) Someone interested in adding information about a trivial event will understandably balk at writing a complete Wikinews article. (Again, that is quite opposite the scenario at Wiktionary, where brevity is actually encouraged.) Taken in that light, I can see how a Wikinews-sysop could view Wiktionary as a hostile environment. From this Wiktionarian's viewpoint, Wikinews' approach seems monstrously inefficient (for good reason; if perhaps, not immediately obvious.)

Continuing in that vein, it is very hard to see why such people insist on re-adding invalid, misleading, inappropriate templates (like {{test}}) here. That, itself, is disruptive. And a waste of everyone's time. --Connel MacKenzie 23:10, 29 December 2007 (UTC)[reply]

Who is this Brian McNeail anyway? --Keene 23:27, 29 December 2007 (UTC)[reply]

n:User:Brianmc. --Connel MacKenzie 04:30, 30 December 2007 (UTC)[reply]

I raised this issue because there was a complaint in the Wiktionary OTRS queue of an indefinite block. I was corrected on the assertion that the block was indefinite but the issue still stands that Wiktionary has an image based on an aggressive blocking policy. --Brianmc 08:48, 30 December 2007 (UTC)[reply]

This IP has two contributions and was issued with a 1 month block for the first, and a 3 month block for a repeat. My interpretation of that edit is a misguided effort to insert the definition of baloney as "does not exist" - which is covered under "nonsense". --Brian McNeil / ^talk 09:33, 30 December 2007 (UTC)[reply]

Not true! That IP has two undeleted contributions, together constituting apparent vandalism, and was issued a one-month block for them that did not prevent account creation. (Personally I wouldn't have issued such a long block, but whatever. This was in 2005, anyway; I'm not sure if SB today would have issued such a long block, either.) Additionally, that IP has two deleted contributions, both of which were apparent vandalism, and one of which earned it a three-month block that (again) did not prevent account creation. (This one seems iffy — after two years, it's hard to be sure it's actually the same repeat vandal — but perhaps Connel investigated the IP address, talked to 'pedians, etc., and knows something I don't.) —Ruakh_TALK 16:02, 30 December 2007 (UTC)[reply]

And not only that. When the anon's second nonsense entry was deleted and the account blocked, the user logged in under another anon IP and recreated that same entry again, which then had to be deleted again. This was an aggressive vandal who was blocked. --EncycloPetey 16:43, 30 December 2007 (UTC)[reply]

True, but that's an argument against warningless blocks — they piss off blockees, and are easily circumvented by said pissed-off blockees — so I decided not to mention it. ;-) —Ruakh_TALK 18:08, 30 December 2007 (UTC)[reply]

That's a very odd conclusion to make. They arrived here already a vandal (both years ago and recently.) While we could be doing much better at tracking where such people go after the first block; that is harder to do, with nothing in the block log. The only reason to add a warning in a situation like that, is to allow them more opportunity to prove what is already known, while at the same time clogging the User talk namespace, the deletion log and ultimately (again) the block log. That is an example of when warnings will fall on deaf ears - the blockee was already a vandal. That is a pretty strong argument against warnings and very much for "warningless" blocks. It is unlikely that the first block was unwarranted - but without recalling the exact circumstances, it seems very probable that it was amidst a flurry if related disruption. So, incrementing to the next block duration was more reasonable than arbitrarily issuing a more typical one-day block. When they return at some point in the future, would it be helpful to see a bitter reminder pointing to previous abuses? --Connel MacKenzie 23:00, 30 December 2007 (UTC)[reply]

Insofar as I understand what you're saying, I don't disagree (except perhaps on some minor points); but my point is, saying "Our approach was ineffective: the vandal circumvented the block and continued to vandalize" is not arguing convincingly in favor of our approach. —Ruakh_TALK 05:19, 31 December 2007 (UTC)[reply]

I think that Wiktionary is an attractive potential target for vandalism, because it requires so little effort. In many ways WT is more content-inclusive than WP so there are fewer easy-to-understand (for newer contributors) standards for readily excluding bad content. This puts more burden on the discretion of those on vandal patrol. I continue to believe that we need some better means of bringing in new contributors or for converting "volunteers" who may initially have a narrow agenda into regulars, but am at a loss for particular constructive suggestions. DCDuring 18:01, 30 December 2007 (UTC)[reply]

DCDuring has made what I consider the most constructive comment. I'm not saying Wiktionary should try and adopt the Wikipedia policy of "flogging offenders to death with scented bootlaces". I'm saying you need to do more of the mundane with boilerplates so anyone - like me - from another project can see you're dealing with someone that is an unrepentant pest. My home project is Wikinews, and I'll often issue a short block to clean up and warn off the pest in question. I've even blocked a /16, so you can't exactly accuse me of being all lovey-dovey with vandals. Oh, and I've been blocked myself. ;-) Favourite reason... "Great incivility". I believe I suggested a Canadian contributor was insane and needed to remove his tinfoil hat. --Brian McNeil / ^talk 10:45, 31 December 2007 (UTC)[reply]

While we're on the subject.. I've never been very comfortable with "Stupidity" being used as a reason for blocking, especially now that it's on the dropdown menu. I admit I've used it a few times myself, but I eventually adopted a 'don't like it, don't use it' approach.. I'd like to see it replaced with something a bit less harsh, or at least something which will force a bit of education on the miscreants, perhaps a wiki-linked synonym of the term stupidity which is obscure enough that the average person will need to look up.. (we are a dictionary, after all ;-) ) .. my .02, fwiw.. --Versa geek 20:10, 31 December 2007 (UTC)[reply]

How about fatuousness? bd2412 T 18:38, 8 January 2008 (UTC)[reply]

Project Multilingual Translations (PMT)

Wiktionary is one of the best multilingual dictionary projects ever appearing on the internet, which has many great advantages like free-availability, upgradeability, real multilingual growth potential apart from too many others to mention. I am one of the great appreciators of wiktionary project, a new member of English wiktionary and admin of my mother tongue Marathi wiktionary.

Regarding full horizontal growth of English wiktionary, and hence of all wiktionaries, there are some points worth considering seriously. I have raised these issues once earlier; but don’t see satisfactory output and am up with some new modified suggestions for existing projects and my earlier suggestions.

Unfortunately, we don’t see real expected quantum of multilingual character in wiktionary. It would be quite pleasure for me; and for whom not, of course; to see translations in more than 200 – 300 languages of any word, opened in wiktionary. I know we are proceeding towards the same & it is going to take some time, but as all would agree, we need to accelerate very much. Infact, the ongoing projects like Translations of The Week aim at achieving the same. I have some suggestions which I request to be very seriously considered and implemented for fast growth. Anybody and everybody dreaming for it to happen, please please consider the following things, comment and discuss on these and let’s begin fast implementations.

Let’s call it the “Project Multilingual Translations”(PMT), or any other better name that will come forward. For the moment I refer to it as PMT.

‘‘Translations of The Week’’ project is just inline with PMT. Let’s work with it only as it already exists.
There should be a team of members of PMT, comprising of persons from all major languages of the world. At least those languages, which have developed their own wiktionaries very well, can be looked forward for better contribution. The member can be any person from concerned language willing to work, but willing to seriously and continuously work. I suggest respective admins can be the first favourite candidates from each language. Any member from any language can contribute but PMT group member has the responsibility to confirm that the task is done. Serious contributors from various languages would be identified and invited to join PMT. (I am proposing my name as PMT group member for Marathi and Hindi. I am a recent admin of Marathi wiktionary.) We firstly invite with honor English wiktionary admins and members of ongoing TOW. We should form a group of these members and they should be on mailing lists of each other. A language-wise list of PMT group members would be maintained with their e-mail address to contact them at Wiktionary:PMT Group Members.
Daily 10 words of English would be the target. It may seem large; but mere translation doesn’t take much for real expert in the language and backlogs can be always covered in case of occasional shortage of time. Daily 10 words would mean annually 3652 words, not really a big challenge to see it positively. Infact, being much greedy, I expect as the PMT takes speed, this daily limits can be increased in multiples also (!), but for the moment 10 is OK to begin with. If some contributors cannot visit daily, they can take up their weekly target of 70 on average and finish their task any way. TOW is doing same with 3 words per week. PMT would run on parallel lines with increased target and with responsibilities truly shared by serious PMT group members of various languages.
Initially, on English wiktionary, words in English would be worked on as a common language medium. Those can be taken to any other wiktionary after primary PMT contributions are over.
A separate category (category:PMT word) would be created for those words dealt with by PMT, so that it would be easy to identify finished (and deep) work.
A template would be created for translation to identify languages translated and languages remaining to be translated. It would be just like present translation template, but contains separate categories like category:PMT required Marathi translation, or in general category:PMT required XXX translation. This would create a list of required work in concerned languages. Once translation of a particular language is added, that category would be automatically removed. As an example, I would modify a word मन्दिर that can be seen soon.

I am beginning the work by trying to identify serious contributors from all major languages. Those who can suggest their own or other’s names, please do not hesitate to exert for this noble cause.

Shreehari 08:17, 30 December 2007 (UTC)[reply]

This makes a lot of sense, but it is almost impossible to force volunteers to do anything :), You may able to find the experts in foreign languages by using the {{Babel}} categories Category:User_languages, but only 109 languages exist, many of which do not reach the expert level. The multilingual contributors here have a lot they can do; WT:TTBC and Category:Tbot entries are also in need of their unique abilities. Looking at WT:TOW, it seems that it has been neglected since the beginning of November, so it would be good if you (or anyone else) were to get that up and running again; and perhaps, if there is support amongst the polylingual (I'm not, sorry), you could increase the number of translations per week. As not everyone visits Wiktionary daily (slackers ;) it would be very maintainance intensive to change the words every day, and it would IMO be better to aim for 70 a week than 10 a day (though I think that that is a very high target at the moment) Luckily, from watching recent changes, it seems that a large number of new users get going by checking translations, so your ranks may swell quickly. I would love to support this in anyway I can, but not speaking any languages, it would have to be on the maintainance or technical side. Conrad.Irwin 10:14, 30 December 2007 (UTC)[reply]

I'm not sure why you offered मन्दिर as an example, but the Synonyms section is not properly formatted. Each synonym must be tied to one of the definitions; see (deprecated template usage) listen for an example where the synonyms section is properly formatted. --EncycloPetey 16:38, 30 December 2007 (UTC)[reply]

I like the idea of this proposal. I wish you luck with it. However, I've found Wiktionarians to not be particularly community-minded, as there hasn't been much in the way of "official" colloborations between the users. Generally each of us does our own thing, and leaves idealisitc wikiprojects like this one for the creator (plus, there's already loads of translation to be done, and there's far more foreign-language words than English ones being created on a day-to-day basis anyway). Examples of collaboration-esque projects used by a small number of users which have gone a bit stale include Wiktionary:Collaboration of the week, WT:WOTD and WT:TOW. There are exceptions though, where many users get together and drop by to pitch in to the cause (Wiktionary:Wanted entries and Wiktionary:Requested articles:English spring to mind). I look forward to seeing you create various pages on how to go about this. --Keene 16:55, 30 December 2007 (UTC)[reply]

With all due respect to the less widely used languages, I think Wiktionary's translation efforts should focus on the dozen or so most widely used languages, at least until we have the bulk of those done. It is my understanding that the vast majority of the world's people can communicate in at least one of those major languages (including English, Mandarin, Arabic, Spanish, French, German, Russian, Portuguese, etc.). bd2412 T 20:05, 30 December 2007 (UTC)[reply]

I have not looked towards maintaining TOW for a long time. Once it had a solid year's worth of translations to be checked, it sufficed as a starting point for anyone interested in maintaining it. When new translators come across it, the three items they see are invariably in need of attention. By giving new translators tiny introductions like that, they (hopefully) are not overwhelmed. Anyone who wants to, is welcome to take over TOW. What it really needs, is a cheerleader (of sorts.) I'm not sure what you are asking for here: help with cleaning translations, or permission to take over TOW. Our list of who does what (fifth item from top) shows that you are very welcome to run with it. Building on what already exists, you could add a one-line list of terms (perhaps seven) underneath the three main boxes. Since no one has maintained the list in a long time, there may be entries listed that have all their translations in order, that should be rotated out. I wish to urge caution though - ten words a day is a fine personal goal, but will burn out most translators. The "weekly" words allow translators to visit irregularly, while maintaining their focus on their primary language's project. Building a similar list for each day would be astonishingly difficult to maintain. --Connel MacKenzie 00:16, 31 December 2007 (UTC)[reply]

Well, I am very thankful to all for their comments and suggestions. As an effort to get it done, I am forming a new page for the proposed project, at Wiktionary:Project Multilingual Translations also with a redirect available through shortname WT:PMT. It is in the quite initial development stage. Kindly keep visiting the discussions page of that project to give further valuable suggestions, comments, and, most importantly, if possible, your own & other serious & consistent contributors names for project group membership. Please wish me, this project, wiktionary and all language lovers a very good luck for success of this project. We would keep meeting & discussing on that discussion page. Thanks. (Kindly keep visiting that discussion page as frequently as parlour, for we need ideas from all for this ambitious target to be achieved. Thanks again.).

Shreehari 07:56, 31 December 2007 (UTC)[reply]

As it gets up to speed, please list it at the bottom of WT:CP. :-) --Connel MacKenzie 19:27, 31 December 2007 (UTC)[reply]

Format for Chinese translations

You might consider contacting Chloejr for collaboration on Chinese translations. He has posted a note at Wiktionary talk:About Chinese#Chinese translations, in which he is requesting feedback on the proper format for Chinese translation in English entries. -- A-cai 23:13, 31 December 2007 (UTC)[reply]

Hebrew diacritic redirects.

Should Hebrew words with diacritics be included as redirects or separate "alternative spelling" entries (or something else entirely)? In parsing Robert Ullman's list of word variations, I see a number of instances such as אבא (redirects: אִבָּא אַבָּא). Cheers! bd2412 T 20:11, 30 December 2007 (UTC)[reply]

FWIW, Wiktionary:About Hebrew doesn't say, and neither does it seem to have been discussed on the associated Talk page. --EncycloPetey 20:16, 30 December 2007 (UTC)[reply]

They need to be included in some form, because people who don't know any Hebrew might look them up. Even people who do know some Hebrew might look them up, if they're copy-and-pasting from another site — not all browsers make it easy to delete the diacritics while keeping the letters, and if you don't want to have to retype the word yourself, you might try the with-diacritics version on the off-chance we have it. However, most of them won't actually meet CFI; and even if they did meet CFI, there's absolutely no benefit to giving them their own entry. (It's not like defence vs. defense or something, where you can look at them and not instantly know the relationship between them; "alternative spelling" entries would be misleading and annoying.) That said, I think Yiddish actually uses some of the diacritics in normal writing. And normal Hebrew writing will sometimes include a stray diacritic as a reading aid — like, signs at junctions will very often say מֶחלף so you know it's "mekhlaf" and not "makhlef" (I assume this is because Academicians got sick of people mispronouncing it) — and you'll note that w:he:מחלף is titled מחלף, and uses that form throughout, except in the first sentence.

All told, I think the best solution is:

Allow diacriticky Yiddish entries.
Don't allow diacriticky Hebrew or Aramaic entries.
Edit MediaWiki:Monobook.js's doRedirect() function: if the current article doesn't exist, and the title contains Hebrew diacritics, remove them and redirect.
If there's a worth-mentioning diacriticky version (as with מחלף), document it in a usage note at the non-diacriticky entry.

(Even better would be if there was a parser function to remove Hebrew diacritics; then doRedirect() could be intelligent and only redirect to a bluelink. Even without that, though, I think this approach is the best one.)

—Ruakh_TALK 21:12, 30 December 2007 (UTC)[reply]

Interesting, but what is done about direct wikilinks to something like מֶחלף? Should they be allowed (with manual redirects entered) or "fixed" when encountered? Just wrapping it with {{Hebr}} seems inadequate, as it doesn't link the item. --Connel MacKenzie 21:23, 30 December 2007 (UTC)[reply]

I don't know. I intentionally left #2 ambiguous between "Have diacriticky Hebrew and Aramaic titles redirect to the corresponding entries" and "Have diacriticky Hebrew and Aramaic titles be redlinks", because I don't know which is better. (But for the record, {{Hebr|[[מחלף|מֶחלף]]}} works fine, producing מֶחלף.) —Ruakh_TALK 05:10, 31 December 2007 (UTC)[reply]

Well, if a Yiddish word using those characters exists, we ought to have it, but if not? Is it not our policy that redirects should almost never be used except for variations of idioms? bd2412 T 03:19, 2 January 2008 (UTC)[reply]

There are language-specific exceptions to that rule - I guess I'm indirectly asking what the resolution was (apparently - no resolution yet?) for Hebrew. Either way, I think the parameter-links (to the non-markedup form) are the only desired links - which seems to be an argument against those redirects? --Connel MacKenzie 08:22, 2 January 2008 (UTC)[reply]

I agree with Ruakh's points of 21:12, 30 December 2007 (UTC). As far as redirects, I say definitely don't delete the ones we have already (redirects are cheap), but only add them for (1) words like מֶחלף that are commonly seen with vowels (i.e., diacritics), (2) words with greater than one Whatlinkshere (in the vowelized form), and (3) very common words (let's say, Swadesh words). That's assuming we can get monobook.js tweaked as above; if we can't, then I say keep and write any and all redirects.—msh210℠ 18:19, 2 January 2008 (UTC)[reply]

Additional points: (1) This discussion on Hebrew can equally be a discussion on Hebrew-character-written Aramaic. (2) Yiddish, though, is a whole other story: (I don't know Yiddish, really, but as far as I know) it is typically written with certain diacritics (which, in Yiddish, are not vowels), and the PAGENAMEs should reflect this.—msh210℠ 18:22, 2 January 2008 (UTC)[reply]

But if the words are "commonly seen with" diacritics, and are correct either way, isn't that a better argument for listing as an alternative spelling? Isn't the spelling with vowel marks, in Hebrew, the traditional and "correct" form, with the markless forms being a modern variation? bd2412 T 19:11, 2 January 2008 (UTC)[reply]

Please take back that last question: we don't want to start that discussion. But to answer your first question, yes, words like מֶחלף are commonly seen with (some of their) vowels, and, as such, you'd think they'd have separate entries. I suppose they can. It seems pretty silly to me, and I suspect it seems pretty silly to Israelis (which I'm not), but there's nothing wrong with it, and it seems to conform to our policy of having separate entries for (deprecated template usage) naive and (deprecated template usage) naïve.—msh210℠ 19:26, 2 January 2008 (UTC)[reply]

Re: your first question: The problem with labeling it an "alternative spelling" is that it's not so much an alternative spelling as an alternative way of writing the same spelling, if that makes sense. (I can't think of a good English analogy. It's not quite the same as the difference between <a> and <ɑ>, or between <f> and <f>, or between <foobar> and <foobar (FOO-bahr)>, but it's in the same general category.) Re: your second question: No, not at all. The whole reason diacritics are used for the vowels is that historically the vowels weren't written at all, until the Masoretes invented the symbols for them. Since they didn't want to alter the text of the Bible, they invented symbols that were essentially many tiny glosses: the text of the Bible was still there and unaltered, but little dots and lines around the letters indicated additional pronunciation and grammatical information that the text itself did not. The vowelless style is the original form, and has always been the more common. —Ruakh_TALK 00:33, 3 January 2008 (UTC)[reply]

Still, it's not quite like writing the same characters in a different font - aren't there circumstances where a word has two meanings, one of which is written with one set of vowel marks (where they are used) and another of which uses a different set of vowel marks? bd2412 T 02:39, 5 January 2008 (UTC)[reply]

Tons — like how two words in English might be spelled the same but have different pronunciations. And just like how in English we might, if necessary, write something like "contract (contract, I mean, not contract)", in Hebrew we might, if necessary, throw in a single disambiguating diacritic and write something like "מֶחלף". And just like how in English you'd only clarify once and thereafter you'd just write the word, in Hebrew you'd only clarify once and thereafter you'd just write the word. (There are exceptions — prayer books, hardcover Bibles, poetry, children's books — where most diacritics are included, but even then there's variation in just what diacritics they bother with.) But no comparison to English is really going to work perfectly; for example, the introduction to one of my Hebrew–English dictionaries includes dageshes in bets, kafs, and peis, but not in other letters, and dots in or above vavs, and sin dots but not shin dots, and then other random diacritics only when it thinks they might help. That specific pattern is not a common practice — I've never seen it anywhere else, actually — and many of the "spellings" it results in wouldn't meet CFI; but given how easy it is to map those "spellings" to normal "spellings" (just remove all diacritics), there's no reason not to support them, as well as all other vagaries of Hebrew diacriticism, with JavaScript redirection. —Ruakh_TALK 04:16, 5 January 2008 (UTC)[reply]

How about this, then: if a word has multiple meanings, of which only one or only a few of which can correctly be represented with a particular set of diacritics, that should be considered an alternative spelling (or maybe we come up with a more accurate way to describe it, like "alternative representation"). bd2412 T 20:17, 6 January 2008 (UTC)[reply]

Why? Maybe I'm missing the forest for the trees, but I just don't see the benefit of that approach. —Ruakh_TALK 21:19, 6 January 2008 (UTC)[reply]

Because a redirect from a use that does not apply to a particular possible meaning of a word would be incorrect, wouldn't it? I suppose we could leave them as redlinks, but someone may try to look up the term with diacritics, and they should be able to find something that shows that such a usage does in fact exist (i.e. is not incorrect). Between redirects and alternative entries, I'd prefer an entry, in that case. bd2412 T 00:16, 7 January 2008 (UTC)[reply]

Hmm. But wouldn't an "alternative representation" entry be just as incorrect? —Ruakh_TALK 02:29, 7 January 2008 (UTC)[reply]

Not if a different usage with additional information-conveying marks is itself correct. bd2412 T 18:43, 7 January 2008 (UTC)[reply]

I'm sorry, but I've read your comment a few times, and I simply don't understand it. Could you rephrase it or something? —Ruakh_TALK 03:41, 8 January 2008 (UTC)[reply]

Ok, let me put it this way. If, hypothetically, אבא has two meanings, but only one can correctly be written as אִבָּא, then we should definitely have an entry for אִבָּא because it is a correct way to represent one meaning of אבא (e.g. we don't want to lead people to think that אִבָּא is meaningless when it can have a meaning); but we can't rightly use a redirect because that might lead people to think that אִבָּא is a correct way to represent all meanings of אבא. Ergo, we need a separate entry for אִבָּא indicating the one meaning it can represent. bd2412 T 03:59, 8 January 2008 (UTC)[reply]

O.K., I see what you're saying, but I disagree: every inflection line at the unvocalized page gives the corresponding full vocalization. (At the Hebrew Wiktionary, it's even better in this regard — the fully-vocalized versions are the L2 headings, which makes them impossible to miss — but the trade-off is sharp, and I'd never suggest we emulate them.) But perhaps I'm biased by the implication of your argument, which is that a large proportion of Hebrew entries — the vast majority, I think — should be duplicated many-fold, once for every possible vocalization. (This is even if we ignore the diacritics used to indicate cantillation; but by your argument, we'd need to include those, too. That means that many words in the Bible would appear something like a dozen times — once for each possible cantillation mark — in their fully-vocalized, since most words can take most cantillation marks.) And anyway, your reasoning doesn't even reflect how we do it for English; we don't have an entry for (deprecated template usage) And, for example, even though that's a correct way to represent (deprecated template usage) and (and in fact (deprecated template usage) and must be written (deprecated template usage) And in some contexts). —Ruakh_TALK 00:15, 9 January 2008 (UTC)[reply]

New Year

Out of curiosity, does the change of year cause any problems to Wiktionary? Are there any local rituals that happen at the turn of the year, maybe software-related? --Keene 01:26, 31 December 2007 (UTC)[reply]

Like sacrificing an admin to MediaWiki? -- Visviva 06:26, 31 December 2007 (UTC)[reply]

OK. Anything to stem the tide of OTRSes. Sign me up. --Connel MacKenzie 17:36, 31 December 2007 (UTC)[reply]

People tend to archive their talk pages - then get on with it. SemperBlotto 08:48, 31 December 2007 (UTC)[reply]

And see Wiktionary:Administrators/Dishwashing for who archives long Wiktionary discussion pages. SemperBlotto 08:50, 31 December 2007 (UTC)[reply]

Some of the discussion rooms and such will get a new section header, but that is (in most cases) a monthly occurrence rather than yearly. --EncycloPetey 17:08, 31 December 2007 (UTC)[reply]

I like the first answer best. --Keene 17:13, 31 December 2007 (UTC)[reply]

So there's no "run-down of the year", featuring the best bits of the last 12 months, including interviews with prominent Wiktionarians looking back on their favourite moments and dubious top-100s? A shame if not. Maybe for 2008 I'll do a chart. Wiktionary:Best 100 bits of 2008 - watch this space! --Keene 17:18, 31 December 2007 (UTC)[reply]

Why not do it here - what are your favorite WT moments of the last 12 months, people? Mine might be my work with my trilobites--Keene 17:22, 31 December 2007 (UTC)[reply]

Wiktionary reached 500,000 entries in July of this year.
We have a lovely new Main Page look.
Lots of inflection templates, too. With help from Medellia and Conrad, we fixed the templates for Latin adjectives, verbs, and created one for adverbs. Opiaterein and Panda10 have worked to develop noun and adjective templates for Hungarian--our first inflection templates for that language.
A-cai took the reins on pages tagged with {{zh-attention}} and has done wonders with getting that list under control.
(and several others as well on ja-attention, Cynewulf, Balloon guy) when I set those up, I thought it would just keep track of those for a long time until eventually they were fixed; but a number of people dug right in ;-) Robert Ullmann 15:19, 3 January 2008 (UTC)[reply]
WOTD now has RSS feeds is on the daily mailing list.
Transwikis from Wikipedia are now GFDL compliant and semi-automated.
AutoFormat
Interwicket, RatPatrol, WT:WL and numerous other automations.
What is RatPatrol?
See User:Robert Ullmann/Rat Patrol. -- Visviva 07:53, 2 January 2008 (UTC)[reply]
More importantly (since this code came from Africa!) w:The Rat Patrol. :-) --Connel MacKenzie 08:15, 2 January 2008 (UTC)[reply]
A GFDL logged IRC channel, irc://irc.freenode.net/wiktionary-gfdl.
Hundreds of minor WT:PREFS improvements.
parser.js et al.
Tbot
Completing the DictList of English entries
Ranking skyrockets.
Addition of the ISO language parameter to all the "Webster" etymology templates.
...

And looking forward to 2008 - the year when :-

Italian overtakes English
Section "add" wizards are enabled (e.g. "Add a Spanish translation for this sense"...)
Wikisyntax errors are warned before saving
Language separation tools improve (Random page --> Random &uselang=English Page; Special:Index/lang=English, etc.)
Preferences are "Gadget"-integrated with WT:PREFS.
En.wikt tops 100,000 registered users (>1,000 active)
*.wiktionary.org surpasses dictionary.com here.
Daily incremental XML dumps. (One can dream, right?)
User:Robert Ullmann/Day lists + Special:Export + Connel is doing some XML-merge magic. (dream on ;-) Robert Ullmann 15:19, 3 January 2008 (UTC)[reply]
...

Italian? Impossible! Doesn't have enough words, even with all the inflected forms. For 2007 I am personally most pleased that variations appendices have replaced the most encumbered "see..." templates, that all pinyin tone transliterations now have entries, linked from all of the corresponding characters, that we've jacked up our admin numbers, and that we have removed lots and lots and lots of junk from the dictionary. I think we've really got a handle on vandalism now, and look forward to keeping it tamped down even further in '08 (and adding hundreds of words from various sources). bd2412 T 03:25, 2 January 2008 (UTC)[reply]

My 'bot (User:Keenesnewbot) will work beautifully, flodding the dictionary with French and pushing English down into 3rd place after Italian. --Keene 04:04, 2 January 2008 (UTC)[reply]

<JOKE> We already have flooding of new entries, but I don't know that we want flodding.</JOKE> :-) --Connel MacKenzie 08:12, 2 January 2008 (UTC)[reply]

Could someone with access to statistics on page hits put up a list of the most viewed wiktionary entries? --Bequw → ¢ • τ 20:36, 2 January 2008 (UTC)[reply]

I think there's (now) a link at the bottom of WT:STATS, but anyhow, Wiki Charts works. --Connel MacKenzie 23:41, 2 January 2008 (UTC) (edit) 00:12, 3 January 2008 (UTC)[reply]

Q re orphan requests for verification or deletion

If an anon puts an {{rfd}} or {{rfv}} on an entry but fails to follow up by entering the request on the Wiktionary:Requests_for_deletion or Wiktionary:Requests_for_verification page, is it OK just to remove the tag from the entry? Right now I'm looking at obtrusive. -- WikiPedant 15:22, 31 December 2007 (UTC)[reply]

You have to use your best judgement on whether it is a good faith nom by someone not conversant with procedure or if it's vandalism. In this case the tagged sense was odd enough I added it to RFV. RJFJR 15:31, 31 December 2007 (UTC)[reply]

Wiktionary:Beer parlour/2007/December

Account necessary before creating new pages

Recent changes to Template:RFV

Wiktionary Day and Main Page redesign

WOTD

Negative prefix word entry automation

Showing regional differences in translations

Alternative spellings policy

a draft

A test of this policy: facade v. façade

Votes

Conclusions

December is Adverb Month

December Adverb Challenge

Christmas Competition 2007

Another duty for AF

Links to Wikipedia redirects

attributive forms of nouns

Homophones at lv4 header under lv3 pronunciation

Talk:喇叭#Mandarin definition

Explanation of deletion

Dictionary in peril

nonce words & coinages

IATA and ICAO airport codes

problem with the Hanzi header

Category:Illiterate

Morse code

Colloquial vs. informal

As Wiktionary Day starts in the Pacific

Vote set

Prioritizing

A Help index

Tbot entries

Vote set

Merge Category:Colloquial and Category:Informal

Include etymology work made by the Youtube "hotforwords" woman?

Alternative cases for internet slang

Translations sections in FL entries

Romanian

Noun

English

Lingala

{{subst:zh}}

Korean

Work in progress - draft documents

Declension and conjugation tables

Abbreviation POS header

AutoWikiBrowser

Splitting grammar from definition

Adjective

Declension

Defining words

Adjective

"Syncing" entries

Not all words in other languages have exact English equivalents

"When someone looks up a word in a dictionary, they primarily want the definition."

A problem with overlapping

Re: Everyone

{{colloquial}} and {{informal}} tags

Working definitions (revisable):

Specific entries (please add and comment)

shakespeare

About Lingala - needs some input

Wiktionary style guide

See also vs. Related terms

New idea for etymology section

Etymology

Etymology

Etymology

Etymology cont.

MonoBot

Accents on Neapolitan entries

Requirements for adminship

Templates in the language drop down

Help for editors

Request

Blocking policy discussion.

Project Multilingual Translations (PMT)

Format for Chinese translations

Hebrew diacritic redirects.