Last modified on 23 May 2014, at 21:23

Wiktionary:Beer parlour/2008/May

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives +/-
2002
December
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014

Special:ListUsers, Does this bug anybody besides me?

I think the first page of the master user list is rather unfortunate. Yes, the user's account was deleted--so the link's red now, if it used to be blue. Can't any more be done? Is there an überdelete? If not, how about resurrecting the user, moving the account to a plain vanilla name, and then deleting again? Snakesteuben 07:11, 1 May 2008 (UTC)

  • There doesn't seem to be a "delete user" function (I was asked to delete one yesterday). All we can do is rename them (but they can then be recreated). SemperBlotto 07:24, 1 May 2008 (UTC)
Block user --> Prevent account creation won't keep it from being recreated? If the page and user functions are indeed separate, then the blocked user name shouldn't change when the page name is changed. But I admit I haven't tried it, yet... Snakesteuben 08:45, 1 May 2008 (UTC)
Edit: Nah, that doesn't work, not exactly like that, anyway... Snakesteuben 08:47, 1 May 2008 (UTC)
Well, a bureaucrat can actually rename a user; see WT:MV. (This will automatically move the user and user discussion pages as well, but as you note, that's conceptually a separate step.) So we could rename the user to something innocuous, and then add the original username to the accounts-blocked-from-creation list. Actually, we could more generally add ^! to that list and consider the problem solved. —RuakhTALK 12:18, 1 May 2008 (UTC)
It is easier to use the list at Special:Allpages/User_talk: - though that only lists user's who have been talked to. I believe there are plans to kill the thousands of dead accounts as Unified Login progresses - but 'til then we'll have to just put up with it. Conrad.Irwin 09:12, 1 May 2008 (UTC)
Ruakh: There ya go!
Hi, Conrad. That's not my concern; I haven't much use for the page myself. What about the public, or the casual contributor who isn't part of this community--and hasn't ferreted out the easier ways to do things? I think one might reasonably look at the registered contributor list to figure out who's behind all this, and then gauge the credibility of the source. If I'm right, then that page doesn't help us, and might lend credence to the anarchy-ergo-unreliability theory. 'Course if I had my way, for that very reason, the default display would sort users by UserPage.Qualifications.Impressive + Contributions * Signal_to_Noise_ratio or some such, rather than alphanumerically. ;-)
But seriously folks, yes, you and I can be expected to put up with a fair bit. But just like paper books, wikts must provide value to more than just their authors and developers. While the user interface we present to the public might not be as important as a book jacket, it's still part of the package, not irrelevant. I think anyway. Snakesteuben 15:23, 1 May 2008 (UTC)
Afaik the official mediawiki answer is that there is no user delete function and the database should not be touched because it could be dangerous for db integrity. But on my personal mediawiki i just did a "delete from user.." on the MySQL console anyways and mediawiki didnt explode, didnt seem to cause any trouble so far. Mutante 21:40, 4 May 2008 (UTC)
The question isn't whether or not it can be done, it is whether or not it will be done. Yes users can be deleted, anything can be deleted given the proper amount of legwork, but we don't generally do it without very good reason. I think that it is far easier just to rename the one offensive username from the first page of the list and move on, we can blacklist any names which are recreated if it comes to that. It isn't worth bugging a dev about it, unless someone wants to create a username_delete extension... - TheDaveRoss 21:49, 4 May 2008 (UTC)
Someone, somehow, made it go away. I am content. Snakesteuben 03:32, 9 May 2008 (UTC)

But no one knows the POS!

I just created the entry for θεπτάνων (theptánōn). It's an incredibly obscure Ancient Greek word, which is only attested in a fifth century Ancient Greek dictionary of obscure and archaic Ancient Greek words, written by w:Hesychius of Alexandria. So here's the fun part: No one knows what the POS was for sure. It could be an adjective (on fire), a noun (something which is scorched, on fire), or as I expect, a participle (≈ being on fire). Now, these are close enough together that we can reasonably give it a definition, but I don't feel confident giving it a POS (hence the POS is listed as Unknown). One of the really cool things about Wiktionary is that we can include incredibly obscure and esoteric words like these. However, we may want to discuss how we want to do things when dealing with words which have incomplete information. To give an even more interesting example, take the Phaistos Disc. No one knows what any of that means. However, for ancient Aegean linguists, it's very important stuff. I want to, eventually, include all of these words on Wiktionary (Unicode is waffling over whether to encode them). If they do, Wiktionary is the perfect place to have them. We can discuss various theories as to their meaning, look at similar characters in other scripts, etc. It's a classicist's wet dream. But, how do they get formatted. Now, this is not an urgent thing, as there are plenty of less esoteric words which we still lack (like the verb ἅπτω (háptō), the participle of which is given as the definition of θεπτάνων (theptánōn)). However, I thought I'd throw it out there just to get the ball rolling. -Atelaes λάλει ἐμοί 23:51, 1 May 2008 (UTC)

I think you should at least put Unknown Part of Speech instead of just Unknown, without Part of Speech a casual reader would not know what Unknown refers to. Nadando 23:54, 1 May 2008 (UTC)
I would think you would take a best guess at PoS. In the leading case you mention, you have a fairly good idea, so to say unknown seems to mislead. The underlying question is which of the specialized needs of researchers can productively co-exist with the needs of the more ordinary users. In the case of the Phaistos Disc, it would seem to belong in something more specialized or perhaps the Ancient Greek Wiktionary (if it comes into being), where it would attract all of those most able to decipher the material. Frankly, if there isn't unicode, then it would seem to be more of a WikiCommons thing for the images. DCDuring TALK 01:07, 2 May 2008 (UTC)
According to Wikipedia, Unicode codes now exist for these symbols. Lmaltier 21:42, 2 May 2008 (UTC)
For really exceptional cases like this, IMO it makes sense to put scholarly interpretations in place of documented use. That is, if there is a school of thought that this is actually a genitive-plural noun, have a "Noun" heading, with appropriate qualifiers in the sense & usage lines. If some possible POS's have only been mentioned as possible interpretations (and never seriously championed), those should be relegated to the notes.
In the case of the Phaistos Disc (likewise, other undeciphered writings), I would expect us to use ===Symbol===, though what the L2 header might be I haven't a clue -- perhaps there is a better use for "Unknown". -- Visviva 14:19, 2 May 2008 (UTC)
I’d also suggest you pick one POS, which seems to be the most probable or less contested, and add a Usage notes section explaining the issue.
By all means, do include those Phaistos symbols. Although maybe an Appendix would be more appropriate. H. (talk) 08:17, 9 May 2008 (UTC)

Proposed change in ToC display

In this discussion, Conrad.Irwin has proposed some CSS code that would cause ToCs to float to the right of the entry text, instead of sitting on the left and pushing everything down. This probably would not have worked smoothly in the past, but seems to work fine now, thanks to Robert Ullmann's great work in sorting out the float properties of various floaty things (here endeth my understanding of that matter).

The specific code would be:

#toc {
  float: right;
  clear: both;
  margin-left: 5px;
  margin-bottom: 5px;
}

Since this would be a very significant change to entry display sitewide, I am posting this here to the Beer Parlour rather than the Grease Pit. Please voice any concerns or objections here. For my part I support this change, which IMO makes entry navigation significantly more straightforward. -- Visviva 15:53, 2 May 2008 (UTC)

Perhaps the margins could reflect the existing document grid. The bottom could match an image thumb (6px), or use the line-height of text (1.5em, computed as 19px in my browser). The left could use the same margin as the navigation boxes in the left column (7px), or the main column of text (12px). Michael Z. 2008-05-02 18:21 Z
I'm pro–, but I think there should probably be a corresponding #toc-float-none #toc rule-set that undoes it, and perhaps a #toc-float-left #toc rule-set that floats left instead (with appropriate margin changes), so we can have {{tocnonfloat}} (and perhaps {{tocleft}}) for cases where they might be useful. (Those are probably bad names, but you get the idea.) —RuakhTALK 21:51, 2 May 2008 (UTC)
I think this is an excellent notion. One area where I'm not thrilled with this is in big community pages like this one, where the standard-issue TOC is actually better; since there aren't many such pages, it would be easy to apply a template where appropriate. -- Visviva 01:38, 3 May 2008 (UTC)
This would be a problem for all the right-floating items we have, such as WP link boxes, {{was wotd}}, and images. If the TOC floats right, then these items either (1) are shoved left into the entry text, (2) hidden by the TOC, or (3) shoved down into the collapsible tables. Is there a proposal to deal with these problems? --EncycloPetey 22:16, 2 May 2008 (UTC)
The TOC on the right seems to work on WP without problems where it is used. However like EP I'm uncertain if it would work with our page structure - take a look at pages like head, bassoon and router. How would these work with a TOC on the right. Until I've seen mockupsthat show how these entries (or ones like them) would be with the TOC on the right I would oppose any changes to the status quo in this regard. Thryduulf 00:02, 3 May 2008 (UTC)
For pages like head, see User:Visviva/head. added: note that that matches the CSS behavior in FF, but not in IE, where it behaves weirdly.
For bassoon, the right sidebar currently renders like this: pediabox, TOC, image. Not ideal, but I think we might reasonably ask whether a TOC is needed on that page at all.
The images in router get pushed down, but not too far: on FF for me, the first image is level with sense 1, and the others stack neatly down to the carpentry sense. (hey, those images should really be in a gallery anyway.) ;-) -- Visviva 01:38, 3 May 2008 (UTC)
Here's how it renders for me in FF: any right-floating thing (image, pediabox) above the first language header displays above the TOC; any right-floating thing below the first language header displays below the TOC. Using the __ TOC__ magic word, it is possible to tweak this if it's not quite the desired behavior -- i.e., forcing the TOC to appear above or below a certain point. (Though honestly, I'd been thinking more in terms of preventing pediaboxes from grabbing this prime real estate.)  :-)
AFAICT, this doesn't affect {{was wotd}} at all. -- Visviva 01:38, 3 May 2008 (UTC)
Having difficulty getting this to work in IE6; perhaps someone CSS-knowledgeable can explain why? -- Visviva 01:38, 3 May 2008 (UTC)
I don't have IE6 to test with, but I understand that it has a bunch of problems with margins on floats, and that some of those problems go away if you set display:inline on the floating element. (This is discussed in various places online, e.g. at <http://www.positioniseverything.net/explorer/floatIndent.html>.) If the difficulty you're experiencing has to do with the margins, it might be worth trying. —RuakhTALK 04:28, 3 May 2008 (UTC)
Thanks. Actually the difficulty I'm experiencing is that absolutely nothing changes, even after a full cache dump. Tried on another computer, since this one is having issues -- still no change, and likewise when I add "display:inline". Odd; I suppose it wouldn't be a deal-breaker (since IE6 users would just get the same layout as before), but it doesn't seem right. -- Visviva 06:00, 3 May 2008 (UTC)
BTW, wouldn't clear:right make more sense for this than clear:both? —RuakhTALK 04:22, 3 May 2008 (UTC)
Update to code to reflect idea's given - this works for me in IE6. Conrad.Irwin 20:45, 3 May 2008 (UTC)
.ns-0 #toc {
  float: right;
  clear: right;
  margin-left: 7px;
  margin-bottom: 6px;
  display: inline; 
}
As this seems to be popular I suggest we give it a trial in the next few days. Conrad.Irwin 20:45, 3 May 2008 (UTC)
Popular? I see two people who've voiced support and two who've raised objections. What definition of popular does that fall under? --EncycloPetey 03:46, 4 May 2008 (UTC)
My definition 'cos I like it - I would count it 4/2 :). Anyway, there is now a new item in WT:PREFS to allow this to be previewed more easily. It should appear at the bottom of the list under the search spellchecker - if not then you will need to hard refresh. Conrad.Irwin 11:04, 4 May 2008 (UTC)
It would be helpful if you can clarify what concerns you feel haven't been addressed. To review your previous concerns again: due to the recent CSS revisions, if the TOC pushes down into the collapsible tables, the collapsible tables simply shift left (no collision). Images and boxes above the first language header render just as they do now; images and boxes below the first language header render below the TOC. In entries that already have a cluttered right margin (multiple boxes), this can get a little messy, but I would submit that those entries are in need of cleaning anyway (we have {{pedialite}} et al. for a reason). In any event, AFAIHS the TOC does not collide with anything, nothing is pushed into the entry text, and {{was wotd}} is not affected.
Anyway, I've been running this for a few days and I'm sure not going back to the old layout. I just think it would be nice if we can offer this improvement to the general user; it seems unfair to keep usability improvements to ourselves.  ;-) -- Visviva 07:03, 5 May 2008 (UTC)
  • One issue I have seen in FF: when the browser window is reduced, a {{wikipedia}} box rendering below the TOC actually blocks out some of the definition text. This seems to be a problem in {{wikipedia}}, just made more apparent by this change; however, I'm not sure which of the various style declarations therein might be responsible. -- Visviva 07:03, 5 May 2008 (UTC)

Wiktionary:About Ukrainian

I've started this language article. Michael Z. 2008-05-03 05:02 z

A modest proposal, re: bad translations on the internet

I'm sure everyone who has dabbled in ttbc has noticed a lot of misspellings, and even totally mythical translations on purported "dictionary" and "translation" sites around the internet. It seems any time one site starts a rumour, the others pick it up. And before long, they drop in a citation to "various references," which I assume means each other. Unfortunately, en wikt is frequently a member of this group. (I recently deleted one such "translation" for the second time--and it was in the main section, not ttbc unfortunately.)

Is there some policy governing how to deal with/prevent this kind of thing?

If not, what do you guys think of maybe creating either:

  1. a list or category of these things, with links to/from the affected words, or perhaps even better
  2. an entry akin to "common misspellings" for the myth word. (If we do this, we should call it something else--these aren't common misspellings, they occur nowhere except on the braindead sites, and sometimes in posted messages by obvious English speakers who were duped into using them.)

Snakesteuben 02:42, 4 May 2008 (UTC)

I like option 2, at least for the serious problem cases. If we don't actively address these misconceptions, they will keep getting added when no one is looking. There could be a standard "Common Foovian mistranslations" (or something) category generated by the template. The exact format of the template bears some thought -- should it include the preferred translation, the putative English equivalent, or both? And what should these be called? "Mistranslations" is a bit too broad. -- Visviva 09:15, 4 May 2008 (UTC)
Good ideas, Visviva. In the mean time, I think I'll start noting such things as hidden text comments next to the relevant entry in the translations sections. (I'm guessing a quick consensus is unlikely, and I'm not senior enough here to take semi-unilateral action ... though you probably are.) Winter (Username:Snakesteuben 02:44, 12 May 2008 (UTC))
Maybe something like the way I dealt with the "phobias" would work here, namely {{only in}}. See Appendix:Invented phobias and aurophobia or Category:Wiktionary pages that don't exist. Though I'm not sure how much support this has either it strikes me that the situations are similar. Conrad.Irwin 09:33, 12 May 2008 (UTC)
{{only in}} (or an enhanced descendant) could be of great use for keeping persistent bad full entries out of principal namespace and for getting some better use out of the Appendices. Addressing more of the full range of user and contributor "errors" might be valuable for reducing vandalism and speeding users toward the entries that they really need. In contrast to redirects, but like {{misspelling of}}, it compels users to note that they have made an error. This seems like yet another good use. DCDuring TALK 12:26, 12 May 2008 (UTC)

User Richardb

User:Richardb has been confronted with repeat copyright violation (see User_talk:Richardb#Citations pages) which was dumped into the Citations namespace. His responses were "I'll leave them there and let another editor format them" [1]; "since you like being the policeman so much I'm sure you'll get far more enjoyment out of doing the deletions" [2]; and "Aw sod off the lot of you. Get a life. I'm busy putting decent stuff into the Wiktionary. Can't be bothered with you boring lot. Won't be replying to this crap any more." [3].

Copyright violation is too serious for such a flippant attitude, particularly for a Wiktionary administrator. I'm now of the opinion Richardb should be desysopped (at the least) and possibly banned if this continues to be his attitude towards violating copyright law. --EncycloPetey 06:49, 4 May 2008 (UTC)

Agree. I'm sorry to say that this user's words and deeds have pretty thoroughly ruled out good faith, and appear to indicate that he poses an unacceptable risk to the project. Further, this is not the first occasion that Richardb has indicated he does not consider himself bound by community norms. Lapses of judgment or temper are one thing, but that is not really an acceptable attitude for an admin.
I don't think an outright ban is necessary, provided that Richardb lives up to his (apparent) commitment to stop engaging in copyvio. He has made valuable contributions here, and hopefully will do so again in the future. -- Visviva 07:10, 4 May 2008 (UTC)
However good a contributor is otherwise, this attitude towards copyright violations is completely incompatible with Wiktionary, and doubly so of administrators. Unless he changes his tune very quickly I don't see an option other than formally requesting he be desysopped. Thryduulf 08:11, 4 May 2008 (UTC)
Well, although I don't know if it counts for a change of tune, his remark of 22:04, 3 May 2008 (UTC) on User talk:Richardb#Citations pages seems to indicate that he does not intend to continue, although he also does not intend to clean up after himself. AFAICT cleanup is now complete in any case, so there does not appear to be an imminent threat to the project. Nonetheless, IMO the risk of having an admin with such open disregard (as it appears) for the most fundamental principles of Wiktionary is still great enough to justify desysopping. Not sure what the procedure for that is... I believe stewards look for clear local consensus, but I'm not sure if that would require a formal Vote or not. -- Visviva 09:10, 4 May 2008 (UTC)
De-sysopping would not prevent a recurrence of the specific copyvio problem and risks causing worse problems. The copyvio issue is easy to fall afoul of, speaking from experience. I'm more concerned with seemingly petulant responses to reasonably polite and even very polite feedback. We have seen some fairly disruptive behavior by some of those who feel unhappy with and alienated from the Wiktionary culture. The disruption wastes our time when it occurs, even though it is remediable. In this case "you boring lot" is a possible sign of that kind of unhappiness and alienation. It would be better to have Richardb on board and contributing than hostile and non-contributing. Realistically, we are better off to let slide troubling incidents separated by months. But the reservoir of AGF good will does decline with every incident. Ordinary contributions alone do not restore it, IMHO. DCDuring TALK 09:44, 4 May 2008 (UTC)
Hear, hear. Widsith 12:07, 4 May 2008 (UTC)
The only reason for removing the sysop flag that I can see is that were anyone to sue WMF about copyright issues the fact that he is an "administrator" doesn't look good. There hasn't been an abuse of the tools. I do agree that it would be best if Richardb at least stopped behaving the way he has been, and perhaps is willing to clean up the stuff that is questionable. I don't think that it is necessary for us to de-sysop him, if he doesn't want to play along anymore maybe he wants to volunteer to step down. - TheDaveRoss 16:54, 4 May 2008 (UTC)
I don't know. He seems to be fairly iffy now on the general topic of following community rules; for example, when I mentioned AGF to him, his response was basically a flat-out refusal to abide by it. So far, none of his willful rules violations has involved admin tools (granted, he deleted RFDO once, but that seems to have been mostly accidental), but do we think he draws a clear line there — "I'll break the rules that anyone can break, but not the ones that only admins can break"? If not, I don't see the point of waiting until he's actually abused the admin tools. Adminship is a matter of active trust, not a passive "benefit of the doubt"–type trust. Personally, I'd have preferred that we talk to him about this; but as soon as he stated openly on his talk-page that he refused to engage in further discussion of his misbehavior, I think EP did rightly in bringing this here and raising the possibility of de-sysopping. (That said, DCDuring makes an eloquent appeal for not de-sysopping him, and assuming that he does indeed stop with the blatant copyright violations, I'm quite happy to hold off until the next "troubling incident", whenever and whatever that might be.) —RuakhTALK 21:17, 4 May 2008 (UTC)
If people feel strongly about it it is worth a vote. You are right about the trust issue, and I think that there have been a few questionable behavioral issues in the recent past with Richardb. A "no confidence" type vote might give people the opportunity to voice their concerns and comments. I don't know that it would succeed, but it would bring out some until now silent voices of defense. - TheDaveRoss 21:32, 4 May 2008 (UTC)
Some people make it really easy to flip from being a good sysop to a bad one. Not that we don't have bad ones to begin with, they're just bad in different ways. Certain sysops have made me a lot less willing to take part in community discussions. Kinda sucks that people who can make wiktionary so unpleasant can still be considered worthy of their sysop powers while other transgressions are held to be more...diabolical. mwahahaha (Kinda not on the exact point, I just wanted to say this) — [ ric ] opiaterein — 13:15, 5 May 2008 (UTC)

Pinyin without tone markings

We are suffering from an epidemic of these lately. The entries added seem to fall into three categories:

  1. Entries that are tone-marking-free versions of otherwise valid Pinyin words: jinu, tiao
  2. Words of type 1 that may actually be used in English and other diacritic-averse languages: Hanyu Pinyin, Guomindang (?)
  3. Alleged Pinyin misspellings, particularly involving the letter "v" (is there some sort of variant system at work here?): lvxing, jinv

I'm assuming that types 1 and 3 should be deleted with prejudice, while type 2 should be converted to English. Is that correct? It would be nice if some guidance on these points was added to WT:AZH. -- Visviva 03:40, 5 May 2008 (UTC)

There's nothing wrong with entries without the tones, as long as the tones are specified. They're useful because you can see which words have almost the same pronunciations except for the tones. In theory, we could keep ONLY these while specifying the tones in the headword, instead of keeping entries for every different pronunciation with tone marks in the page title. (Latin doesn't specify which characters have macrons in the page title, why should Chinese specify the tones?) Note also that we don't really have a system for marking tones for Cantonese, without the numbers. So when we manage to find a good Cantonese contributor, what then? Entries like wong4fan1? Whatever decisions we make about this can't be so hasty. — [ ric ] opiaterein — 13:09, 5 May 2008 (UTC)
Well, personally I'd like to see evidence that pinyin is ever used to write Chinese by actual Chinese speakers communicating with other Chinese speakers. Here and elsewhere, I've seen claims that it is used in children's books (but nobody seems to have a specific book title or ISBN handy), and that it is or has been used for internet communication due to the complexities of encoding (but the only uses of Pinyin on Usenet seem to be by/for learners). But that's more of a general issue... At any rate, if Pinyin is really used for Mandarin, but not really used for Cantonese (etc.), then it seems obvious to me that only Mandarin Pinyin entries should be permitted here, and only in the form in which they are actually used.
Regarding Latin, I'm given to understand that the reason is that Latin has seldom/never actually been written with the macrons; they are purely a lexicographer's convention. If that's also the case for tone markings in pinyin, then by all means we should eschew tone markings. But in any event, we shouldn't get into the trap of having entries for "words" that are never used for communication in any language. We are the dictionary of all words in all languages, not the dictionary of all words in all languages transliterated into all possible writing systems. -- Visviva 13:27, 5 May 2008 (UTC)
Let's not forget that pinyin is the official transileration system even in China. Also, why discriminate against "learners" in favor of native speakers? What good is the English wiktionary going to do for native speakers of Chinese? Unless they're learners :) — [ ric ] opiaterein — 14:33, 5 May 2008 (UTC)
The point is that all words have to pass WT:CFI, which means basically that they have to be verifiably used to convey meaning in the given language. People trying to learn a language online are not a valid source of information here (they are a big part of our target user base, but that's another thing entirely). Treating interlanguage as a language in its own right makes sense in studies of second language acquisition, but it is not a very useful approach for lexicographers to take. Also, I may be wrong, but I'm fairly sure the Pinyin which is official in the PRC uses tone marks.
Anyway, sorry to have driven this off-course ... while I remain dubious of Pinyin entries in general, what I really want to know about is what the community thinks of these nonstandard, ad-hoc Pinyin entries. Is there some unique rationale for keeping these entries that would not apply to any ad-hoc romanization of any language written in non-Roman script? -- Visviva 14:52, 5 May 2008 (UTC)
I used to try to create separate pinyin entries, but no longer do so. If someone else creates a pinyin entry, I make an attempt to correctly format it (time permitting). The reason I no longer create pinyin entries is that if you create a proper Mandarin entry using simplified or traditional characters, you should be able to type pinyin into the search box, and find what you're looking for.
As for the "lvxing" spelling, it is not "legitimate" pinyin. It should be lüxing, or more correctly, lǚxíng (旅行). If I were to use a Pinyin-based IME to type 旅行, I would have to type "lvxing" in order to get what I want. Most English keyboards don't come with a "ü," so many IME's substitute a "v" for purposes of typing. -- A-cai 13:51, 5 May 2008 (UTC)
Aha! Thanks for that info. Is there a good way to note this in the 旅行 entry (and others, as appropriate)? That could help to resolve the anon's concern at Talk:lvxing. -- Visviva 13:57, 5 May 2008 (UTC)
As a Chinese learner, I often want to verify a word I've learned. For example, looking up wanshang (without tone markings, which are hard to type in the search box). So I don't think such entries should be deleted; they can redirect (except in the case when there is more than one word for a single romanized spelling). 24.29.228.33 16:21, 5 May 2008 (UTC)
Whenever you edit an entry, at the very bottom you have this drop-down list with Pinyin section that contains character with tone marks, which you can insert upon clicking. Search results should include entries with transliterations with tone marks, even when you don't type them explicitly. --Ivan Štambuk 19:13, 5 May 2008 (UTC)
No, they can't redirect, because there is every likelihood that wanshang (inter alia) is an actual word in another of the thousands of languages we seek to cover.
If the non-marked Pinyin is included in the relevant entries for real words (real Pinyin and Hanzi), those pages will appear in both internal and external search results. Would that be sufficient? -- Visviva 09:10, 6 May 2008 (UTC)
  • Tone-marked Pinyin is very hard to input especially for Chinese beginner. However, non-tone-marked Pinyin is convenient for processing. Actually, an entry is tune marked in the content of the entry (for example: tongyi).
  • User-friendliness is great, but should not come at the cost of allowing entries for non-words. There has to be a better solution. -- Visviva 09:10, 6 May 2008 (UTC)
Completely agree with Visviva. While I've never been a huge fan of having transliteration entries at all, people keep saying that Mandarin's a special exception, and I'm willing to take them at their word on that. However, there absolutely needs to be a standard. That standard should be whatever it is that people are actually using to communicate, be it with accents, without, whatever. And if both are used, then we need to pick one of them, because having two sets of transliteration entries is simply too much. -Atelaes λάλει ἐμοί 17:02, 6 May 2008 (UTC)
It is not an exception. The pinyin entries are there not because they are transliterations, but because pinyin is very often used to write Mandarin. Chats, IRC, SMS messages, email, etc frequently use pinyin (usually sans tone markings). We don't want any "transliteration" entries. We have entries for the single syllable words with and without tone markings (this is a finite set, about 1700 IIRC), we should have entries for common words often written in pinyin; this is what (e.g.) A-Cai has been doing. Robert Ullmann 14:48, 9 May 2008 (UTC)
Just to reinforce Robert's point, here is a link to a picture of a book cover (note the Pinyin without tone marks). I occasionally cite this dictionary in my entries. Another use of Pinyin, which Robert did not mention (but seems worthy of notice), is in URLs. For example: http://www.kexue.com.cn (kexue means science). -- A-cai 11:39, 10 May 2008 (UTC)
Well, people do lots of funny things on book covers. Does the dictionary also use pinyin without tone markings in the entries? That would be interesting. I have a hanja dictionary that includes pinyin (along with kana), but it uses tone markings.
The URL argument would surely apply to all romanizations, including those of Korean, Arabic, etc. Probably not a road we want to go down. :-) -- Visviva 12:15, 10 May 2008 (UTC)
I'm not disagreeing -- I honestly don't have enough information -- but I'm troubled that in the several times this issue has been raised, not one verifiable case of Pinyin being used for authentic communication among native or native-like speakers of Mandarin has been provided. Obviously chats and IRC aren't normally archived in a durable (or even non-durable) fashion. But we don't normally accept words under these conditions. -- Visviva 12:15, 10 May 2008 (UTC)
I'm only half-heartedly defending Pinyin entries. In truth, if we have them at all, I would be more in favor of them being created by a bot (i.e. converted from a simplified or traditional entry). The fact of the matter is that a number of contributors have added Pinyin entries (with and without tones). The real question is whether we want to encourage or discourage them from contributing in this way. Personally, I've always felt that multiple entries for a single word is one of Wiktionary's drawbacks. It creates too much busy work for contributors (particularly in Chinese), and often results in multiple inconsistent entries for the same word (no matter how hard I try to sync them up). However, given Wiktionary's current technical limitations, I'm not sure that we have another good option to multiple entries. -- A-cai 12:55, 10 May 2008 (UTC)

template:hockey

Following the RFDO discussion that resulted in terms relating to field hockey now being categorised in category:Field hockey rather than directly in category:Hockey, this template now labels entries as (field hockey). Thus I propose it should be renamed to template:field hockey. I would just do this, but I'm unsure if this would cause any issues for the articles that transclude it. Additionally, I'm not certain that we would want to keep the resultant redirect. Thryduulf 12:00, 5 May 2008 (UTC)

Done We definitely should keep the resultant redirect, I would (as I said on RFD) never refer to Field hockey as anything other than "hockey". Conrad.Irwin 16:24, 5 May 2008 (UTC)
As a Canadian, I take exception. Hockey always means ice hockey, and this view is supported by the Canadian Oxford Dictionary. The primary sense is our unofficial national sport, while other sense is a mere Britishism. :-)
I suggest creating a neutral template:hockey which places entries in category:Hockey, where they can be easily found and assigned to the correct subcategory(ies). Michael Z. 2008-05-05 17:03 z
Would that category be for words that are used in the contexts of all forms of hockey, or for no words at all? Either way, the category text should state the category's purpose clearly, so that people familiar with only one type of hockey or the other don't assume it's talking about their type.—msh210 17:33, 5 May 2008 (UTC)
Good question. Do we prefer to see the general (hockey), or the wordier (ice hockey, field hockey) in a sense? Personally I think it is best to reduce the number of unique terms used, and remain unambiguous, so I think the latter might be preferable. If we see (hockey), we may not know whether a Canadian or British editor meant only the type they are familiar with, or hockey in general.
I don't know enough about field hockey to compare, but the terminology looks to have a lot of differences. Terms which overlap are goalie, hockey stick, wing, hookMichael Z. 2008-05-05 18:09 z
I don't know huge amounts about either sport, but I know a little more about field hockey. Basically they are different sports that have evolved from a common premise (i.e. a team sport the object of which is to score goals by using a stick to hit a small object into a net) - they are different enough that even for something as basic as hickey stick, we need separate definitions. I wouldn't object to keeping template:hockey as a way to categorise words temporarily until they can be sorted into the correct sport. "Field hockey" is not a term that I have ever seen or heard in the UK, so it wouldn't be intuitive to British editors to categorise their word such. I guess the same may be true of "ice hockey" in North America? Thryduulf 18:29, 5 May 2008 (UTC)
Ice hockey is heard in Canada, but rarely used, except when it is specifically needed to differentiate from the variations floor hockey (e.g. in gym class, with a light plastic puck), field hockey, street hockey, etc. The CanOD's definition of ice hockey is "= hockey 1". I believe that British-style field hockey is played, but it is unfamiliar, and most Canadians would assume that field hockey is just ice hockey played outdoors in summer = ball hockey.
I don't know if this holds true in the central and southern USA, where winter ice rinks aren't ubiquitous. Michael Z. 2008-05-05 18:42 z


By the way, it sounds like the main definition should be moved from field hockey to hockey (2). Michael Z. 2008-05-05 19:19 z

Yes it should. 19:31, 5 May 2008 (UTC)
Done, please review. Michael Z. 2008-05-05 20:27 z


One more: please review street hockey, to which I added the Canadian form. These could be reasonably combined into a single definition based on hockey, but that would be treating the two different senses as one, and worse, presenting two distinct games of street hockey as one. Michael Z. 2008-05-05 20:49 z

They all look good to me. Thryduulf 21:23, 5 May 2008 (UTC)

While I have your attention, please check the descriptions at hockey stick ("primary implement" just didn't seem that useful, and there was no indication that they were different). Michael Z. 2008-05-05 22:37 z

Interwiki links to redirects

There is some debate about whether we should use interwiki links to link to redirects on foreign Wiktionaries. In particular the issue is centered around User:RobotGMwikt, which is currently set to remove interwiki links that point to redirects, though I hasten to add that the underlying issue is far more important than the bot issue (for this discussion). As this has been raging on IRC for the last 48 hours, I hope that posting it here will help to resolve the situation.

For those who don't know, the interwiki links are used on Wiktionary to link pages with exactly the same title on each Wiktionary. i.e. our entry hello links to the French hello. An issue presents itself when the other Wiktionary has a redirect at the page title, should we link to it (on the grounds that there is definitely some information there) or not link to it (on the grounds that it is not the kind of information that people are expecting from the interwiki links). There are no doubt stronger arguments both ways, and GerardM has written a summary of his thoughts at http://ultimategerardm.blogspot.com/2008/05/robotgmwikt.html which are worth reading before entering the discussion.

I would prefer if redirects on other wiktionaries were not interwiki-linked to. If they were real words, why are they then redirects? And if they aren't real words, why imply that you can get somewhere by clicking the interwiki link? I think this will increase as wiktionaries grow. ~ Dodde 22:56, 5 May 2008 (UTC)
Likewise, I would not want links to redirects. I can imagine hypothetical cases where iw links to redirects might be desirable, but to date I have not encountered such cases except as mistakes. --EncycloPetey 23:19, 5 May 2008 (UTC)
I do want redirect-links. If another Wiktionary sees fit to make use of a redirect, then I see fit for us to respect that use, link to that redirect, etc. This is especially true because in many cases, it's fairly arbitrary which entry is the redirect and which is the main one; for example, our don’t is a redirect to don't, but another Wiktionary might well do the reverse. Would y'all suggest that our entries shouldn't link to each other? —RuakhTALK 00:31, 6 May 2008 (UTC)
Personally I think soft redirects are better suited for intended redirects, because you are able to explain why you are redirecting and let the user stop guessing, and as such they will qualify for interwiki links anyway. What a hard redirect mean is so different between projects and also from case to case, that I see no reason to include these pages into the web of interwiki links. Some interwikilinks could probably be discussed to be justified, but I am afraid alot more would not, and this would imho end with more confusion than clarity. You always have to be aware of what the redirect mean on that particular wiktionary, if any system is present at all. ~ Dodde 00:59, 6 May 2008 (UTC)
I agree with Ruakh. The whole point of linking to redirects is precisely because other wikis use redirects differently, and no one Wiktionary should dictate their use. If another wiki wants to redirect all alternative spellings, or plurals, or whatever, to a single article, we shouldn't then remove all links to those redirects, as if that wiki doesn't have the content. Likewise, in the rare cases where redirects are used on en.wikt, aside from the conversion script's little droppings, they are done to consciously take someone searching for one thing to the page where the content actually is. Dmcdevit·t 01:18, 6 May 2008 (UTC)
A good way to look at this is to view it as if you are deciding what to do on another wikt about links to the en.wikt. If you have a local entry for an idiom (one of the cases where we use redirects), but not in the same canonical (or "citation form") as the en.wikt, would you want to link to the redirect? If your entry is apple of one's eye do you want to link to our redirect? Of course, you aren't as likely to have apple of somebody's eye. Likewise if you have have Arabic or Hebrew forms that we have redirected to the forms w/o vowel markings, and so on. In the same way, when the FL wikt redirects forms or variants, we want to link to them, respecting whatever policy they have. If the FL wikt changes something, we just link to whatever they are doing (and see next section). Robert Ullmann 11:08, 6 May 2008 (UTC)
I agree with the above points to include links to redirects, since there is no way to know whether the redirect is useful or not. Of course, an option would be to immediately link to the page which is redirected to. Would that be a problem?
Note that at the same time I am a fan of sort redirects as well, it’s just that for some cases, they don’t make sense, as Robert pointed out. H. (talk) 07:54, 9 May 2008 (UTC)
I belive that each language community should be able to decide how to structure their data: if they want to use soft redirects or hard ones, what to do with alternate spellings or clitic forms, whether to include romanized entries or not, and all of these things mean that we should allow iwikis to redirects. Sure, there are going to be some mistakes. Over time, those will be fewer and fewer (we hope). Right now we have iwikis from a page which has a word in one language to a page on another wiki with no entry for the same language. That's an iwiki that's not so helpful; but no one suggests doing away with them. We just figure that over time we'll get it right. -- ArielGlenn 20:46, 14 May 2008 (UTC)
What you may or may not do in the future is a hand waving exercise.. We are talking the current state of play. Currently there are four solid reasons why we should not refer to redirect pages and your argument does not diminish any of them. GerardM 07:38, 15 May 2008 (UTC)
Forgive me, but we have two solid reasons why a Wiktionary shouldn't (in general) use them (our multilingual nature and the issue of homonymy), one solid reason why a lot of Wiktionaries have them anyway (case conversion), and one supposed reason why we shouldn't link to them (a claim that they don't really have the target entry anyway). Ariel's argument diminishes none of them because none of them needs diminishing: they range from petty to irrelevant. —RuakhTALK 22:35, 16 May 2008 (UTC)

There is no way in which you can distinction between intended and unintended redirects. This is why the argument is moot. GerardM 10:34, 13 May 2008 (UTC)

I'm not following this. Bots cannot make the distinction (presumably), but humans can; so these should probably not be added by bot, but there is no reason they cannot be added by humans. -- Visviva 11:00, 13 May 2008 (UTC)
If you suggest that all interwiki links are to be created by humans I think you are completely right. In that case we do not need to argue about the algorithm used by bots. GerardM 13:51, 13 May 2008 (UTC)
So is a middle path here to allow people to add iw-links to redirects manually in specific cases, while bots shouln't add, nor remove iw-links to redirects? It seems the argument to include iw-links to redirects is that we might miss a useful link here and there, but the negative effect of adding alot of "false" iw-links seems at the same time to be completely overseen. Allowing to add this manually will give the positive effect that only iw-links are added where there is a good reason, and where there is not we become without alot of "crap" iw-links. (Regarding the bot, is this kind of extinction possible/easy to implement?) ~ Dodde 14:32, 16 May 2008 (UTC)
If in many Wiktionaries, "bad" redirects continue to exist erroneously after ConversionScrtipt, iwiki linking to them will draw attention to that and in the long-run improve quality. There's a reason red-links are red; it draws attention to ways to possibly improve Wiktionary. Interwiki links to unwanted redirects is not desireable, but the problem is that the bad redirect, not that we link to them. Don't hide the error, let each Wiktionary determine how to use redirects, have the wiki bots link to them, and when we iwiki to a "bad" one, let someone clean it up. --Bequw¢τ 15:50, 17 May 2008 (UTC)
I have given this some thought, and it is possible that my mind has been affected by narrowness to some extent. I have taken in some of the arguments for allowing redirects to be iw-linked to, and all-in-all I think I agree more than disagree now, that redirects should be linked to, given the variaty in how different language editions of Wiktionary choose to use their redirects within the project. It's not just a matter of using the character ' or ´, but the way of presenting some words in determined form or not (like some languages names: Canarias - or - Las Canarias etc.) - probably there are quite a few examples of likewise differenties between language editions of Wiktionary. I understand it was quite some time since this was discussed, but since I was one of those arguing against iw-linking to redirects I felt appelled to acknowledge my change-of-mind. ~ Dodde 03:03, 14 July 2008 (UTC)

Interwicket and Arabic

Why is Interwicket removing so many interwiki links to ar? I've checked, and the links do not exist, so the bot seems to be functioning properly. Did a mass deletion happen at ar.wiktionary.org? Anyone know what's happened? --02:06, 6 May 2008 (UTC)

ar:User:Lord Anubis did remove a large set of entries, all capitalized forms (Destiny, Comoros, etc) so the bot is functioning properly. Why they were removed is not known (the edit summary is "Bot: deleting a list of files", very helpful); I've dropped the user a note expressing curiousity. They were not uc→lc redirects: the would-be lc targets don't exist in at least some cases I've looked at. (e.g. ar:destiny doesn't exist) In any case, not our problem? (;-) Robert Ullmann 10:46, 6 May 2008 (UTC)
The entries I'm noticing are for proper names of stars (e.g. Algol, Deneb, etc.) and constellations (e.g. Cancer). And earlier today the link to Deutsch disappeared. Do you suppose they've eliminated capitalization althogether? --EncycloPetey 13:22, 6 May 2008 (UTC)
There are lots of words, all capitalized, see log there. But not redirects, since lc form isn't there (deleted Crossover, but no crossover). So it was some sort of content page? Perhaps a bunch of stuff imported a long time ago that they decided to just trash? No way to tell. Robert Ullmann 13:32, 6 May 2008 (UTC)
We were removing content that was imported from GPL-licensed lists, because GPL is not compatible with GFDL.
We are trying to get these lists licensed under a dual GFDL/GPL license, but till this is done, we cannot use the content on Arabic Wiktionary.
It's all under control :).
Oh and btw, the edit summary was initially "Deleting GFDL-incompatible files", but my bot goes crazy sometimes. :).

--Lord Anubis 15:43, 7 May 2008 (UTC)

Thank you for the explanation (:-) Robert Ullmann 15:44, 7 May 2008 (UTC)
Oh ye, and next time we add them, we 'll ensure that they are not unnecessarily capitalised.--Lord Anubis 15:45, 7 May 2008 (UTC)

proposed vote on inclusion of WMF jargon in the main namespace

Per a discussion on RFV, I have proposed a vote on the inclusion of WMF jargon into the main namespace. Please change its wording as needed and comment (there, not here).—msh210 19:02, 10 April 2008 (UTC)

I've now modified it; please have a look.—msh210 18:26, 1 May 2008 (UTC)
And now it's live.—msh210 16:04, 8 May 2008 (UTC)

Appendix:List of Latin phrases

This is useful, but should it be moved to Appendix:List of Latin phrases in English? Which is what it seems to be. Widsith 11:42, 9 May 2008 (UTC)

Only if these phrases are not used in other languages. I can't say because I'm uninformaed on the possible use of these phrases in French, German, Polish, etc. However, on a quick look, they seem to be phrases that would likely have been used either in conversational Classical Latin or written Latin of the medieval and later periods. --EncycloPetey 13:39, 9 May 2008 (UTC)
To call it something other than its current name seems premature. What we have is a list of Latin phrases some of which may sometimes be embedded in English text, with English translations and English commentary. I consider it a useful document for adding new entries and for facilitating certain kinds of searches. It is likely to have other uses. Verification that the majority of the phrase appear in English, let alone other languages, is not available.
If the entries for the listed items are done under the Latin L2 heading, should we indicate that the headword was commonly used in English? Does that fact need to be attested? DCDuring TALK 14:08, 9 May 2008 (UTC)
  • A list of phrases which simply existed in Latin would already seem to be covered by Category:Latin phrases. Whether or not they exist in French/Polish etc does not seem to be addressed by the Appendix, which translates them into English and explains them in English. So it seems to me that the Appendix is designed to list all the Latin phrases which are used by English writers, and its current pagename is confusing (to me at least) because of its apparent crossover with Category:Latin phrases. Widsith 15:43, 9 May 2008 (UTC)
    How did you conclude that the Appendix is designed for that? Solely because it explains them in English? This is the English Wiktionary, so everything is explained in English. I don't see any evidence on the Appendix page that indicates the list was developed specifically for terms that appear in otherwise English texts. --EncycloPetey 17:58, 9 May 2008 (UTC)
OK. So does that mean its content will be the same as that of Category:Latin phrases? Widsith 18:03, 9 May 2008 (UTC)
No. The main namespace requires entries to have 3 citations (or 1 in some cases), and forbids entries that are mere sum of parts. An Appendix is often freer in what it permits, and may include items that would not be included in the main namespace. --EncycloPetey 21:47, 9 May 2008 (UTC)

Yeah, I don't know if there is a limited source or sources, but the list appears to be a compilation of phrases used in English as well as specific mottos and quotations (e.g., "cave canem" from a Pompeiian doormat). I suspect it's too late to apply a specified scope to this large list, but it could be split off into other more specific lists, if someone wants to take it on as a project. Best just to let it continue to grow, and continue to apply the normal attestation requirements on Wiktionary entries for both Latin and "English" Latin terms. Michael Z. 2008-05-09 22:59 z

  • The reason I asked is because you can get those dictionaries of Latin terms in English, and I thought this was maybe our own useful version of such a thing. But apparently not. Widsith 07:38, 10 May 2008 (UTC)

Fixing wikisyntax typos

I've created a punch list of mis-matched ( ), [ ], and { } in entries; they are often very hard to notice even when looking right at them. If anyone would like to help fix them, see User:Robert Ullmann/Mismatched wikisyntax and of course comments and suggestions are wanted. There may also be other things it can look for. Robert Ullmann 14:36, 9 May 2008 (UTC)

in vacuo

I have made an initial entry under the English heading. But our collegue EP suggests there could be a case for a Translingual header. What is the general opinion? Is this term widely enough used in most languages? -- Algrif 14:06, 10 May 2008 (UTC)

Google finds a few instances of the phrase in German, Dutch, and Italian Wikipedia, so it's possible that it is more widely used (although the vast majority are on en: and la:).
Is there a guideline with our definition of translingual: how many languages do we need attestations in to call it translingual, rather than simply a Latin borrowing into several languages? Or do we reserve the designation for things which are more inherently universal, like chemical symbols and proper names of species? (see children of Category:Translingual.) Michael Z. 2008-05-10 19:05 z
Well, it can't be a Latin borrowing if it's not a set phrase in Latin. In Latin, this entry is merely sum of parts, and so would not merit an entry. However, if it occurs in the middle of texts of various languages in this set form, then we have a case for a Translingual entry. There are a lanrge number of chemical symbols and scientific names of taxa that are translignual, yes, but also some abstract symbols, numbers, and some abbreviations and codes. There are also a few phrases or abbreviations of Latin origin, like sp., spp., etc. that have been adopted into many languages. --EncycloPetey 19:15, 10 May 2008 (UTC)
Perhaps it was a set phrase in the sciences, when European scientists still spoke and wrote in Latin. Michael Z. 2008-05-10 21:48 z
WT:ELE says "this heading includes terms that remain the same in all languages. The symbols for the chemical elements and the abbreviations for international units of measurement are but two examples of translingual terms" (my emph.). We should find attestations in a diverse selection of widely-used languages, say, Chinese, Spanish, Arabic, Hindi, and Russian, before we can conclude that it is truly translingual.
I guessed that in Cyrillic it might be ин вакуо, but only found a single Russian citation on the web, which appears to be quoting some Latin text. Of course, it might be Cyrillicized differently. Michael Z. 2008-05-10 19:20 z
Hm, things like chemical symbols, math, taxonomic names, metric units, internet top-level domain names are truly translingual, used everywhere. Perhaps etcetera is too, but I'm skeptical about things like sp., spp., for which other languages have their own names (e.g., Ukrainian вид, Turkish tür). Michael Z. 2008-05-10 19:28 z
I think we have interpreted Translingual to include terms in "scientific Latin" that have achieved some acceptance in the international scientific community. This doesn't seem clearly consistent with the phrase "in all languages" in WT:ELE.
This works pretty well for the taxonomic names and possibly for the language used to describe species and specimens. The extension to a term like "in vacuo" is a more modest stretch from the descriptive language used in botany. OTOH, EP has instructed me that the adjectives used in species names (eg, multiflora, latifolia, carolinensis) are Latin, albeit New Latin, not Translingual. DCDuring TALK 20:41, 10 May 2008 (UTC)
Oops, now that I think of it, Russian documents would probably write in vacuo in Latin characters, since they are not as foreign as Cyrillic characters are to English readers. I don't really read the language, but there appear to be a few cases of this in the first couple of pages of Russian-language search resultsMichael Z. 2008-05-10 21:33 z


  • Surely it will have different pronunciations in different languages? Widsith 20:49, 10 May 2008 (UTC)
    How is that different from the chemical symbols for the elements? In English Hg is pronounced [eɪtʃ dʒiː], but this is not how it would be pronounced in French or Spanish. The scientific name for the Asteraceae (sunflower family) is pronounced differently in different countries as well. The "Translingual" label indicates only that the written form is common to many languages, and does not speak to the pronunciation. I know of no Translingual entries that wold have the same pronunciation in multiple languages. --EncycloPetey 21:08, 10 May 2008 (UTC)
We have:
  1. unpronouncable or unpronounced (g2g) entries
  2. symbols (eg letters, digits) that do not have their own pronunciation, instead taking it from the associated word
  3. multiple pronunciations for the same word in the same language.
I don't think that pronunciation has enough muscle to determine this. DCDuring TALK 21:16, 10 May 2008 (UTC)
Also note that a symbol is different from a word. Hg can be spelled /eɪtʃ dʒiː/ or simply read /mṛkjuri/ in English, corresponding to two different pronunciations, /ha ge/ or /rtutʲ/ in Ukrainian. The Latin (translingual?) term in vacuo may be pronounced something like /ɪn vækjuo:/ in English, and practically identically /in vakjuo/, if read from a Ukrainian text. Michael Z. 2008-05-10 21:33 z

Perhaps the yardstick for translinguality is when something becomes a symbol, and is released from the restrictions of pronunciation in its original language. Asteraceae is still a Latin word: Canadian /æstəɹ'eɪjsiə/ or Eastern European /asterat͡s'eja/ are still examples of people reading Latin with their own accent. But $, mm, Hg, 42, °, =, .de would be spoken in the local language, and are going to need a very large "pronunciation" section. (.com may be an exception, because it is an acronym "dot com", not "dot see o em"). Michael Z. 2008-05-10 22:06 z

  • The fact that several languages may or may not use a Latin term does not make the term Translingual. It's totally different from a Chemical symbol like Hg. Someone writing in Taino or Xhosa can use only the symbol Hg if they are composing a professional chemical document. There is no analogous situation for a phrase like in vacuo, which I simply do not believe can be valid usage in every language in the world. Widsith 22:47, 10 May 2008 (UTC)
First, I hope that we are not challenging the labelling of taxonomic names as Translingual.
Is in vacuo like carolinensis (which is Latin, per EP)?
  • If so, then in vacuo is Latin. If it is Latin, we then need to determine whether it meets WT:CFI. I would propose that, on the one hand, it is Latin, but, on the other hand, it is not SoP because it is a set phrase in its use embedded in other languages, where it is used by those who might not be able to decipher it by its Latin components and know it mostly as a phrase and, accidentally, by its similarities to words in their language (for English speakers: in and vacuum.
  • To me it seems easier if they were both deemed Translingual based on their use in scientific literature and separately determination were made of their qualification as Latin (use in epigrams (?), religious documents, and other modern Latin usage for such New Latin words). DCDuring TALK 01:33, 11 May 2008 (UTC)
A comparison between carolinensis and in vacuo is not appropriate. The term carolinensis is used regularly in Latin contexts. This happens in the Latin circumscriptions of newly described species, which are a requirement for legitimate publication of a botanical species. It also occurs in botanical texts of Linnaeus and others, who were still publishing in Latin. The adjective carolinensis also declnes as a Latin adjective in these publications. By contrast, in vacuo is a prepositional phrase, so it will exhibit no inflection. If it is occurring as a set phrase in multiple non-Latin languages, then it is doing something that carolinensis is not. The adjective carolinensis does not migrate into non-Latin languages except as a component of a proper noun naming a species; it never crosses over as a word in isolation. We are saying that in vacuo does cross over as a unit. So, if we want to draw comparisons, we need to find other non-inflecting phrases for comparison, such as caveat emptor, ad hominem, or sub nomine. --EncycloPetey 01:51, 11 May 2008 (UTC)
  • Translingual is not panlingual -- it was not so long ago that we reviewed the fact that taxonomic names are not panlingual (East Asian and other languages use homegrown terms of equivalent specificity). It has been proposed somewhere that "translingual" denote a term used in three reasonably disparate languages. In vacuo would seem to meet that standard, although I'm not sure if it is used in any non-Indo-European languages (maybe Hungarian or Finnish?). -- Visviva 03:39, 11 May 2008 (UTC)
    How about Japanese: [4]. --EncycloPetey 03:46, 11 May 2008 (UTC)
  • Personally I think that misses the point. Translingual terms to me are terms which are used "by the international community", i.e. practically speaking within fields such as the sciences which have internationally-recognised terminology – IPA symbols, binomial classification, chemical symbols etc. Latin phrases seem to me to be a different kind of thing altogether. I mean the English word bar is used with identical meaning by dozens of languages around the world. Is it Translingual? Widsith 07:40, 11 May 2008 (UTC)
    No, because bar inflects differently in different languages, so it isn't the same across languages. By contrast, see the example above of in vacuo used in a scientific article that is primarily in Japanese. This indicates acceptance and use "by the international community". --EncycloPetey 13:14, 11 May 2008 (UTC)
  • Thanks all. Having gone through the pros and cons, I have changed it to Translingual, particularly noting that this heading does not mean Panlingual. I would appreciate any Latin input to improve the entry. -- Algrif 19:38, 24 May 2008 (UTC)

Category:New words from Wikipedia

Which contains Citations: namespace pages.

I've got a new minion collecting words used in the 'pedia which we don't have. The citations probably don't meet CFI for strict 3-independent-source criteria (e.g. if RfV'd), but do provide one usage example and context. It isn't automatic, anything I'm suspicious of gets checked by me in Google, or just skipped. (I have code to tell it to correct spelling in wikipedia when it finds an error and I tell it the correct spelling; but they won't let Python edit ... so I just leave them ...)

Did you know we didn't have ethernet? (Citations:ethernet)

We have only 180K or so English entries, there are 500K missing to get to the Random House Unabridged (which is what I had under my bed growing up, I would lie there and read, and reach under and drag it out when I met a word I didn't know. I learned almost every word in it in the process :-). Lots more to find. Robert Ullmann 00:52, 11 May 2008 (UTC)

What is the mechanism for removing items from the list as entries are made? I would suggest a two-step process. First we need to check whether we have the lemma form of the word. If we do not, then there is more work - and more value.
The first non-lemma I tried modded led to an incorrect inflection of mod (one "d"), which was useful to correct. The second, crossbands led to a missing lemma.
This seems like a good way to generate some new entries. It would be interesting to find potential entries that were on multiple lists of wanted entries, especially lemmas. DCDuring TALK 01:52, 11 May 2008 (UTC)
When the entry exists, the Citations: page will no longer appear in this category. (modulo some job-queue updating that doesn't seem to find updates; they changed #ifexist to put the links in the page table, breaking Special:Wantedpages, but then didn't make it do what was intended! I had to purge Citations:scute to make it disappear from the cat). You are quite right, it leads to finding "lemmas" that are wrong (mod, you missed moding by CheatBot ;-) or incomplete (restructuring should have a noun sense).
Just to point it out clearly: these citations do not meet the strict CFI-3-independent-use requirement for an RfV; they are simply helpful and/or illustrative.
The method of finding words in the WP might shock you ;-) Robert Ullmann 12:21, 11 May 2008 (UTC)

aphetic form of

Please see scoriating. I have not found an "aphetic of" template. What categorization should be used? --Panda10 14:35, 11 May 2008 (UTC)

The general template {{form of}} allows you to do this:
# {{form of|[[aphetic|Aphetic]] form|[[excoriating]]}}
--EncycloPetey 14:50, 11 May 2008 (UTC)
Thanks. --Panda10 14:55, 11 May 2008 (UTC)

Category for moving along on foot

I'd like to create a category for entries that mean moving along on foot. I have about 80 words/expressions on my list. Could you help me with the category name and its place in the category tree? Thanks. --Panda10 15:06, 11 May 2008 (UTC)

Somewhere under Category:Movement, I suppose. Not sure what to call it -- "Foot transport"? -- Visviva 16:09, 11 May 2008 (UTC)
Category:Human gaits looks appropriate. Mike Dillon 16:14, 11 May 2008 (UTC)
Thanks. I will use that. --Panda10 16:49, 11 May 2008 (UTC)

Category:Movement

This topical category is too ambiguous. I propose we do what Wikipedia has done and use Locomotion instead < parentage Biomechanics < parentage Physiology and Mechanics. __meco 06:13, 12 May 2008 (UTC)

I think our needs are rather different from Wikipedia's, particularly when it comes to these sorts of "everyday" concepts. The approach taken by WordNet is probably more suitable, at least for nouns (locomotion < movement < change(s) < action(s)). There may be other examples we would want to consider -- isn't there a map of the Roget's categories around here somewhere?
Anyway, I've said this before, I'll say it again -- Words Don't Have Topics. Our categories can map onto any number of lexical properties -- semantic relations (as here), discourse field, usage, etymology, etc., but they cannot map onto topics, because there is no meaningful association between (most) words and specific topics. Thorough category reform is needed, but presents serious challenges. -- Visviva 07:46, 12 May 2008 (UTC)
I see any move toward being more like Wikipedia as a step backwards :p — [ ric ] opiaterein — 17:38, 13 May 2008 (UTC)

Foreign terms

[See also #How should Wiktionary distinguish between two classes of non-English words?, above.]

How to indicate foreign words which are normally written in italics?

My paper dictionary (CanOD) italicizes a headword "if the word is originally a foreign word and not naturalized in English." The various inflection templates could have a parameter like loan=yes, to italicize them. This could be accomplished with an HTML class, so the display of such loanwords could be customized by wiktionarians. For example, from comme il faut:

Adjective

comme il faut (comparative more comme il faut, superlative most comme il faut)

  1. Proper...

Or does this need to be applied to specific senses? (I presume that each specific etymology is likely to either be normally italicized, or normally not.)

How do we define foreign terms? I suggest that attesting italicized use may be good enough. Do we need to distinguish several classes of them (beyond what is accomplished by adding context labels)? Michael Z. 2008-05-14 01:15 z

Our formatting is inconsistent enough that italicizing headwords, while a good idea, is probably not by itself sufficient to clarify anything, especially since (unlike in most print dictionaries) our headwords are not all close together in a way that makes variant formatting stand out. Similarly with example sentences and even quotations: it's not instantly obvious to a reader that our italics are meant faithfully. So, I'd support italicize=yes or something, but a stock usage note seesm necessary as well. —RuakhTALK 01:44, 14 May 2008 (UTC)
Well, an advantage of italics is that it doesn't hit you in the face, but it is obvious when you look for it.
But where would you put a usage note. This seems to be something that belongs to the headword line, not to individual senses. Michael Z. 2008-05-14 02:06 z
Sorry, but I don't understand your question. Usage notes don't belong to individual senses; we sometimes use {{sense}} to indicate what sense they apply to (and usually clarify it in the text of the note as well), but that doesn't affect the placement of the note. And we have no shortage of usage notes that apply to all senses of a term. —RuakhTALK 02:42, 14 May 2008 (UTC)
Oops, I misunderstood. That makes sense. Michael Z. 2008-05-14 07:09 z
I once drafted a stock usage note at {{en-usage-foreignism}}. Thoughts & improvements thereon would be most welcome. -- Visviva 09:06, 15 May 2008 (UTC)

I'll work on implementing italicize=yes in the inflection templates. Anyone have comments or suggestions? Michael Z. 2008-05-14 19:22 z

If this is to be done, it should be done for form-of templates, too, no?—msh210 19:44, 14 May 2008 (UTC)
I think the inflection template is enough, since there is normally one present above the form-of template. But I'll think about where else we should italicize words. Michael Z. 2008-05-14 19:51 z
But we'd want consistency, no? Italicize all words that [whatever the criterion is]. That's whether they appear in inflection lines, in definition lines, or elsewhere. Or am I missing something? (Of course another consideration is that, I think, some people have the preference to view all form-of parameters in italics.)—msh210 2008-05-14 (9 Iyar 5768) 19:59:23 UTC
That might be nice, but might be impractical. It already collides with general use/mention italicization of English terms using {{term}}Michael Z. 2008-05-14 20:04 z
Such words could be included in Category:English borrowed words. I'll think about adding a category hierarchy parallel to the etymological derivations for un-naturalized borrowings from one language to another. Michael Z. 2008-05-14 19:49 z
I think this is a bad idea. Firstly the parameter should (if this is to happen) not be called "italicise=yes" but something less format specific like "borrowed=yes" so that users don't think "I prefer italics, so I'll make my entries italic". Secondly, visiting readers will not know that italics isn't normal, if they even notice it is italic at all, and so we'd have to add the ====Usage Note==== anyway - making it redundant. Thirdly it makes everything just a little less consistent, and slightly more complicated - two areas which need to be moving in the opposite directions. Conrad.Irwin 20:34, 14 May 2008 (UTC)
Agreed. (My comments above on how to implement this all meant "if we use this".)—msh210 2008-05-14 (9 Iyar 5768) 20:46 UTC
Fair enough. borrowed=yes sounds good, because it refers to the linguistic concept and not its presentation. I'd also like to consider whether there is any sense in incorporating different types of loanwords (=borrowing, =calque, =reborrowing). But I think that information probably belongs in = Etymology = and this should incorporate the use of italics in English only.
I agree that it may be hard to notice—and I'm convinced that it is a great advantage to convey meaning in a way which is easy to understand, without adding any clutter or visual distraction (exactly what italics are meant for). It's used to good effect in some dictionaries (e.g. my Canadian Oxford), as are various typographic conventions in various dictionaries. This in no way mandates the addition of a usage note; on the contrary, it obviates that cluttering method which is not in use anyway (only 1 of 75 terms in Category:English borrowed words has such a note, and the particular one is contraindicated by my paper dictionary).
Whether it complicates the use of inflection templates is a question. Two possible problems: it remains absent from many foreign terms, or editors add it to naturalized English terms (we know how some hate "foreign" diacritics). We should specify some sort of attestation test for its application. Michael Z. 2008-05-14 21:15 z

I've created an initial draft proposal, which addresses some issues mentioned, but still leaves some questions open. Please keep general discussion here, and refer to User:Mzajac/Foreign termsMichael Z. 2008-05-14 22:40 z

Regional language vs regional topics

Regional context tags like {{US}} are used to indicate different things:

  1. Regional spellings, as in labor:
    • labour (UK, Australia, New Zealand, Canada)
  2. Regional senses of words, as in station:
    1. (Newfoundland) A harbour or cove with a foreshore suitable for a facility to support nearby fishing.
  3. Regionalisms only used in a place, as in jambuster (labels the sole sense, but really refers to the entire entry):
    1. (Canada, Manitoba and north-western Ontario) A doughnut filled with jam.

The problem is that regional language and a regional context are two different things. SAS means one thing in a British context, and another in a Scandinavian one—but these are not examples of British English and "Scandinavian English". Both senses are used world-wide to refer to the particular things. Another example is горелка (gorelka), which is used in South Russia to refer to vodka. But it is also used in general Russian in a Ukrainian context, or to indicate a certain mood associated with Ukraine, because this word comes from the Ukrainian горілка (horilka). These would be indicated with (South Russia) and (Ukraine), respectively, but when you combine (South Russia, Ukraine) you can see how inadequately they convey two different messages.

Another unfortunate result is that categories like Category:Canadian English get full of words like Canada Day, Canuck, Montrealer, Robertson screwdriver, which are not restricted to Canadian English, but belong in Category:Canada, instead.

My paper dictionary (CanOD) uses geographical labels on a headword or a sense to indicate where a word is used, formatted the same way as other context labels.

  • bunny hug noun Cdn (Sask.) ...
  • 2 Cdn ...

But it uses a different style of comment to indicate regional context only:

  • FCC abbreviation 1 (in Canada) ... 2 (in the US) ...

Shouldn't we accommodate this distinction, and have one set of labels and categories for regional usage (US, British, Canadian, Newfoundland), and another for regional topics (in the US, in the UK, in Canada, in Newfoundland)? Michael Z. 2008-05-14 02:14 z

Perhaps they can be made more distinctive by some typographic treatment, e.g.:
British
(in the UK)
[Canadian]
 Michael Z. 2008-05-14 02:23 z
Perhaps we can have the regional topics included into the text of the definition (FCC: The United States Federal Communications Commission), with context tags used only for regional use. This would allow us to keep our current category scheme (i.e., that the regional {{context}} tags categorize as "Canadian English"). Then we'd need to have new categories for regional topics, as desired, and add them manually using [[Category:. These should be called, perhaps, Category:United States, at al.—msh210 17:49, 14 May 2008 (UTC)
Sounds like that works in principal, with a little tweaking of the category tree. I hope editors wouldn't continue to use the regional templates for everything, but good documentation on templates and category listings should help. I'd like to make a proposal with some specifics about the changes, and present it here (may be a week or two).
This is a significant change, so I'd like to see broad consensus for this. Anyone opposed? Michael Z. 2008-05-14 19:28 z
No opposition from me, I think its a good idea. Thryduulf 20:14, 14 May 2008 (UTC)
Ditto. —RuakhTALK 21:03, 14 May 2008 (UTC)
This problem has also been bothering me. If we can get a clean explanation of both the technical and editing aspects, I'd certainly support this. --EncycloPetey 21:49, 14 May 2008 (UTC)

I've made some notes at User:Mzajac/Regional language vs regional topics. Still needs more detail, I think, and a few sets of eyes to find what's still lacking. Michael Z. 2008-05-15 04:30 z

Can someone else please be the bad guy

So KYPark and I have been butting heads a few times as of late. The most notable discussion can be seen at Talk:못하다, and there's also a bit at User talk:Dmcdevit#Deleted Category:Euro-Korean words. Additionally, I blocked them last year for the continued insertion of Korean-Germanic cognates into Korean entries. KYPark has a history of making edits which, in my opinion, use guerilla style tactics to push their point of view (a point of view which they are generally alone in), notably with Korean etymology and transliteration. However, they are also, again in my opinion, an excellent and highly skilled editor, which is what makes them so problematic, because it would be so much easier if they were worthless and I could simply put a long block on. So, I just noticed Citations:witch. The citation seems reasonable enough, but the etymology bit seems completely outside the bounds of what we want to have......anywhere on Wiktionary. My first instinct was to simply remove the content, but every time I've done something similar I've been accused of being a rogue bully admin. If the community agrees with my opinions on the matter, can someone else remove the content (and can we please remove KYPark from the whitelist so someone can keep tabs on their additions). If I am being a bully, please tell me so, and I will desist. -Atelaes λάλει ἐμοί 06:37, 14 May 2008 (UTC)

You are in fact and effect inviting the innocent to an evil age-old witch-hunt party. Just stop it right now, I say. I expected this proceeding, and edited the very Citations:witch in advance, just to suggest that you are unforgivably wrong and evil. Behave yourself, I warn you. Should you be told to do so by Wikt, it should be prepared ... --KYPark 11:11, 14 May 2008 (UTC)
This user should definitely never have been whitelisted. The best I can say is that many of his recent contributions have been simply value-neutral, not requiring immediate cleanup. -- Visviva 11:22, 14 May 2008 (UTC)
Non ety, non-citation content deleted. DCDuring TALK 11:43, 14 May 2008 (UTC)
Atelaes, could you talk a little more about your reasoning for blocking/deleting vs. posting something at WT:RFV or WT:RFD? Also, I think we should be using the {{fact}} template more often. For example, if you found his claim about the Indo-European thingy (I didn't read the whole debate in detail) to be dubious, you would insert the template so that everyone would know that it is an unverified claim (until a proper reference is given).
KYPark, you strike me as a non-native speaker of English. Just an impression, please correct me if I'm wrong. My initial feeling is that some of your posts come off as a bit defensive (clearly, you were unhappy about your stuff being deleted. An understandable reaction) or impolitic. I attribute some of this to a lack of ability with some of the finer points of English discourse. If your skills in English are not the problem, then it could be that you feel that your work is being attacked by non-experts. You must understand that we have no way of verifying anyone's level of knowledge. This is why it is important to cite credible references when entering potentially contentious information.
At this point, I'm trying to remain neutral. I'm doing my best to give both of you the benefit of the doubt. How will you respond to my post, I wonder? Will your response be diplomatic, sarcastic, funny, mean? I have no idea. What I do know is that a lot of people form opinions about a contributor after reading posts at places like Beer Parlour. -- A-cai 12:33, 14 May 2008 (UTC)
Tbis is not to respond to A-cai at all. Please, please, don't be too excited by the fact that some Korean words sound like Western words.
What is the the way you like best?
The note material at Citations:witch was not etymology and was not citations. It simply does not belong there. I believe that Mr. Park's judgment may possibly be impaired by the anger generated by the exchange leading him to revert my removal of it. I have rolled it back. I do not wish to get into a revert war. Could someone else take a look at the material and determine whether there might be another place where it can do Wiktionary some good. DCDuring TALK 14:56, 14 May 2008 (UTC)
I'm not sure if this is the correct line to put a response to A-cai, but hopefully they'll see it anyway. The problem with using {{fact}} for the Korean/Indo-European "cognates" is that it was simply too distributed, for one thing. This was not an assertion made on a Wikipedia article about the history of the Korean language, but rather was contained within the Etymology and Related terms sections of a number of entries (I'm not really sure about the exact number, you may have to ask Stephen about that, as I believe he did most of the cleanup). Now, I'll come right out and say that I know very little about the Korean language, however the initial claim of a relationship between Korean and European languages struck me as, well, surprising. I did a bit of research, talked with some other editors, and the conclusion I came to what that this was not a claim taken seriously at all within historical linguistics. When I initially talked to KYPark about this (which can be read at User talk:KYPark#Block), they gave me the impression that their only evidence was that words sounded alike. It is my opinion that this is not an acceptable method for deducing genetic relationships on Wiktionary. I hope that at least begins to answer your questions. Please feel free to restate any that have not been answered. -Atelaes λάλει ἐμοί 19:20, 14 May 2008 (UTC)
Also to respond to your query concerning posting something on rfv/d versus simply removing it: That is something which I decide on a case by case basis, and I do not think I could give you a reliable rubric for it. However, I can note on my reasoning for the content removals specific to this case. Concerning the Korean-PIE cognates, I think I covered that fairly well in the preceding paragraph. As for the bit on 못하다, I felt that such a discussion about and critique of Wiktionary policy was clearly outside of the bounds of what we have in the mainspace entries. Thus, discussion about the merits of the specific content were unnecessary, as to include such content would require a complete revamp of what we have in our entries. As for Citations:witch, the etymology was completely unscholarly, and, again, not the type of content we have in our entries (additionally, etymologies, of any quality, do not go in the citations namespace, but rather the mainspace; although since this was a quoted pseudo-etymology, perhaps that is a grey area). -Atelaes λάλει ἐμοί 06:52, 15 May 2008 (UTC)
I don't know enough about word histories to even judge the arguments here, about whether these things belong in Wiktionary, but I am confident that they do not belong on the citations page. As little as possible there should be composed, so if you find yourself trying to phrase something just so then it probably doesn't belong. What you could cite are other references that make your argument for you... although I doubt quoting them so extensively would be fair use. Even so that's apart from the question of how much should be mentioned, if any of it. DAVilla 18:44, 15 May 2008 (UTC)

If your skills in English are not the problem, then it could be that you feel that your work is being attacked by non-experts. You must understand that we have no way of verifying anyone's level of knowledge. This is why it is important to cite credible references when entering potentially contentious information.

by the way who are you at all, mr. dcduring? do you know Korean at all? how much you know? would you dare to compete with me? you choose the best way you like. come on baby. --KYPark 15:49, 14 May 2008 (UTC)

I know an angry person when I am in contact with one. I know material that doesn't belong in Citations when I see it. The material about Korean etymology looked tendentious to me, but I do not hold myself in any position to act on that impression alone in a matter of Etymology - and did not do so, as best I can recall. I am aware that there have been disputes in the past about areas of conjectural etymology. As a Wiki we need to limit ourselves to theories that are fairly widely accepted among lexicographers. DCDuring TALK 16:08, 14 May 2008 (UTC)
dear DCDuring, who are you talking to? me? oh no it's not me. you must be talking to someone else. go ahead. but if you'd answer me, read my word carefully enough. then the anwer should come out of itself, not necessarily by you. understood? --KYPark 16:25, 14 May 2008 (UTC)
No. I do not understand. DCDuring TALK 16:29, 14 May 2008 (UTC)
AEL 1
  • Atelaes did the right thing. Lots of KY's contributions have been useful, but the promotion of supposed Germanic-Korean cognates is so far out-there as to be extremely misleading to users. Widsith 19:32, 14 May 2008 (UTC)
Atelaes, I agree with you that it is not good evidence if words sound alike. I would like to note though that they have found evidence of so-called Caucasian humans in the middle of China. That could be a link between Korean and Indo-European, but I'm not sure. Mallerd 21:23, 14 May 2008 (UTC)
This one and more I believe It could be something, it could be nothing. Mallerd 21:29, 14 May 2008 (UTC)
Those fair-haired blue-eyed Caucasians would be Tocharians, Indo-European ethnolinguistic group responsible for some well-known IE loanwords into Old Chinese and other neighbouring languages. They certainly do not represent evidence in favour of "Uralo-Altaic" hypothesis, or of common development between IE and Altaic. --Ivan Štambuk 01:36, 15 May 2008 (UTC)
Talk:witch#Etymological notes deleted

This call for the "bad guy" started from the Etymological notes, which is necessary for this talk but deleted by User:DCDuring. So I copied and pasted it on the above page. From the above talk, I reallize there appear a very delicate misunderstanding against me and the resulting injustice done to me. So I have to defend myself positively while showing how others offend me intelligently. Please come and read, though you may need much patience. I'm so sorry to respond individually. Thanks. --KYPark 13:14, 15 May 2008 (UTC)

I have already tried to reason with KYPark in years past, in regard both to his unsupportable folk etymologies and his refusal to stick to the Revised Romanization that we use here for Korean, and he absolutely refuses to listen to reason on either score. Now whenever I encounter his edits, I simply remove everything concerning etymologies and I fix the transliterations. It is true that there are a small number of Korean words that were borrowed from Sanskrit in ancient times, but I don’t think that KYPark knows about those. If he would stick to definitions and grammatical work, he would be a very valuable contributor, but what he does here makes a laughing stock of our Korean entries. Because he’s never going to give an inch, I believe the only options we have are (1) to slowly and tediously correct all of his work, or (2) automatically revert everything he does that hints of a Korean-Indo-European nexus, or (3) just block him for a period every time he adds an etymology. —Stephen 14:06, 15 May 2008 (UTC)
마니다 (manida)
# to handle, cf. French manier  

This illustrates "what he does here makes a laughing stock of our Korean entries" or Stephen's intention to "automatically revert everything he does that hints of a Korean-Indo-European nexus," that is, to remove "cf. French manier."

Is this "etymology," "Korean-IE nexus," and "a laughing stock" indeed? You are supposed to be the opinion leader in this regard. Yet you look hypersensitive or extremely allegic to the possible Korean-IE nexus to your great dismay. I don't understand why you are so harsh. Do you know the fact that the exact 1:1 transliteration that you have opposed so harshly is now being given as additionally as was done by me as a bonus as well as "cf. French manier" above? To me, your allergy looks like a real laughing stock. --KYPark 16:28, 15 May 2008 (UTC)

By the sheer number of human languages (around 7,000) and the number of terms in each language, it is relatively easy to find in 2 non-related languages a few words that are similar in meaning and pronunciation. This information may be interesting, but does not make the word pairs cognates or make the information fit for the Etymology sections of entries. Keep the information in Appendices or the User area, unless it has been accepted by area experts. Thanks to those that cleaned up the entries. --Bequw¢τ 20:06, 15 May 2008 (UTC)
Right, Bequw. The majority of the linguistic community remains unconvinced that an early Altaic version of the Korean language was strongly influenced by IE, so we must rely on published linguistic works to claim that, for example, 마니다 (manida) derived from manier. (I speculate that it more likely derived from the native Korean root in 만하다 (man-hada) or from that in 만들다 (mandeulda, to make), but I won't make such claims in our main namespace.) Similar to the English Wikipedia, we avoid original research in such controversial matters and must fall back on authoritative publications. Rod (A. Smith) 20:36, 15 May 2008 (UTC)
Excuse me Rod A. Smith, but I thought that experts didn't see Korean and/or Japanese as Altaic languages as well. Have I missed something? Thanks Mallerd 20:51, 15 May 2008 (UTC)
Correct, Mallerd, and no need to excuse yourself.  :-) KYPark seems to be in the minority group who considers it an Altaic language, and he further distances himself from the mainstream by claiming that an early version of it was strongly influenced by one or more IE languages. I didn't mean to lend any support to that notion, nor to support any claims that Korean or Japanese are Altaic languages. (I've modified my above post to clarify that.) Rod (A. Smith) 21:30, 15 May 2008 (UTC)
There's nothing wrong with providing readers with possible mnemonics for learning Korean words, but there is no place for such information in mainspace entries. Specifically,
  • such information doesn't belong in the definition line, since it has nothing to do with the definition of the term.
  • It doesn't belong under "Derived terms" or "Related terms," since there is no etymological relationship.
  • And it certainly doesn't belong under "See also," since a reader following the link will learn nothing about the Korean word.
Such information could be placed in an Appendix:Mnemonic aids for English speakers learning Korean words, or similar. Of course, a French sounds-alike term would be useful only for French speakers, so that would need to go in a separate Appendix. And frankly, I don't think any such correlations are useful for the majority of language learners; effective mnemonic strategies are something which individual learners have to work out based on the peculiarities of their own brains. For instance, I first learned "매다" as "to weed," by drawing a picture in my vocabulary book of a hawk () weeding a garden. Would that be useful enough to put in an appendix? I rather doubt it. -- Visviva 03:28, 16 May 2008 (UTC)
Having read through the entire exchange to this point, I think I understand the nature of the problem more clearly. The original question posed by Atelaes was whether anyone else could be the "bad guy." In looking at the qualifications of the various contributors, it would seem that Visviva and Stephen are the most qualified. Of the available contributors, these two seem to be the most knowledgeable about Korean. If anyone has a chance of getting KYPark to be reasonable, it would be another Korean expert who could provide credible evidence to counter any dubious claims. I know nothing about Korean, so I obviously wouldn't even attempt to debate KYPark about Korean. However, I do think it is legitimate for non-Korean speakers to ask a Korean speaker to refrain from original research and to cite his or her sources etc. -- A-cai 13:52, 16 May 2008 (UTC)
AEL examples
Excuse me, but I have to move to the leftmost as follows:
매다 (mae-da)
# to weed, cf. mow

Uses

* 호미매다 to weed the weed with the hoe.
* 으로 베다 to mow the grass with the sickle. 

This should be absolutely all right. This has nothing to do with original research. I wish to make this more interesting, surprising, or motivating as follows:

매다 (mae-da)
# to weed, cf. mow, Dutch maaien, German mähen, 
  Old English māwan, Old High German māen. 

Uses

* 호미매다 to weed the weed with the hoe.
* 으로 베다 to mow the grass with the sickle. 

Neutral Korean students would be surprised, while some of Korean scholars, anti-Eurasiaticists, anti-Euro-Koreanists, etc. probably more or less upset. But do you insist that this is an original research, especially claiming the "Korean-IE nexus"? Why should this be so different in effect from the first example?

Currently, Wikt misses this well-known etymology in the mow page. I don't know if it has done so all the while. What if that etymology were deleted? Then I cannot help but guess that it may have been deleted by those who were badly afraid of the relation to Korean 매다. Such could probably exist on earth. And they must hate me as if a witch hated by Western Christians. Then, they would make me a prey of witch-hunt. To me, such is a war, so-called w:science war!

Refrain from saying too easily here. Please try to be more cool, smart, and neutral. Evaluation is up to everyone, hence mostly subjective rather than objective. Note that I look like being witch-hunted. Please do any justice to the likely prey to the wicked, twisted or evil, like me. Your sense of justice is on the testbed. (Let me drop below another example for your reference. I am not sure but may further respond to Visviva.)

wick (plural: wicks)
# a bundle of twisted fibers in a candle or lamp.  

See also
* witch 
* wise
* wit
* white
* bitch
* Korean   (bit, bich) light

--KYPark 03:30, 17 May 2008 (UTC)

KYPark, I'm not sure I understand what you are trying to say. Do you mean that the English word mow is somehow related to the Korean word mae-da? If that is the case, can you list a reliable dictionary, book or website where you found this information? If you cannot provide a source of the information (other than yourself), then it would qualify as original research. As you know, original research is not allowed. -- A-cai 04:50, 17 May 2008 (UTC)
A-cai, no, I don't. I said above: "But do you insist that this is an original research, especially claiming the "Korean-IE nexus"?" This is to deny your and others' doubt. The above comparative data may be regarded as such, if I were listing them under Etymology, Derived terms, or Related term. So far I've denied again and again and again. But they would not understand and believe my word as such. So I cannot help but doubt their mindset, orientation, or motivation. In a nutshell, I've edited Wikt mainly to help Korean students learn foreign languages, esp. English more effectively. (I wish them to know that Korean is not such an island as the mainstream linguistists believe.) They are most famous, if not notorious, for spending an enormous amount of money in learning them. Unfortunately, however, their achievement is very doubtful, not to mention their hardship and loneliness. All the gentlemen above, including you, need not bother them, nor what Korean education should look like. This is a matter of Korean strategy, in a way at least, which I thus warn others to be very careful not to interfere with. Thanks. --KYPark 06:58, 17 May 2008 (UTC)
Reducing the language anxiety of Korean students is a worthy goal. However, it is extraordinarily unlikely that Korean EFL students would be looking up common Korean words on the English Wiktionary. I say this as someone who has been working with such students for the past 6 years. It is far more likely that they would be looking up English words on the Korean Wiktionary. If such content belongs anywhere, then, instead of putting notations like "cf. mow" at 매다, it would make much more sense to place a note at ko:mow, something like "<매다>의 뜻과 비교됨." (I'm not sure whether such content belongs on KO either, but that is a KO issue.)
Likewise for the respective Old High German (etc.) material. This is completely useless for someone learning either Korean or English, but might conceivably be useful for L1-Korean students of Old High German. In any case, it doesn't belong here. -- Visviva 07:54, 17 May 2008 (UTC)
Traditionally, the Korean-English dictionary or 한영사전 in the book form and now online has offered the kind of information given under == Korean == of the "English Wiktionary" (en.wiktionary.org) you mentioned. No doubt such dictionary has been a must to put Korean into English. Now there are a number of similar online dicts. For example, just take a look at this page for 매다. And compare this with my example given above, and evaluate the difference. Explicitly and implicitly, there is everything I answer you.
The naver page ends with a blind alley or 막다른 골목, while my wikt edit is widely open toward boundless information resources. My edit is not just for young Korean students, but for any Korean who bothers foreign languages, e.g., to know Dutch maaien or German mähen, if not Old English māwan, Old High German māen. Suppose her common exploration routine such as 베다 or 매다 > mow > Translations > Dutch maaien or German mähen. On my page, she may be glad to go direct to her destination, rather than through Translations.
What is the comparative superiority of English Wiktionary (en.wiktionary.org) over all the other online dictionaries. Outstandingly through the hub called Translations of == English ==, all words of all languages are interconnected within one framework. In principle, anyone can begin with any word in any language and end with any other word in any other language. This is just great to anyone! Visviva's real intention is not to advise Koreans not to bother using "English iktionary" (en.wiktionary.org). --KYPark 13:26, 17 May 2008 (UTC)
But on the Wiktionaries, there is not one Korean-English dictionary, but two: that found on the Korean (KO) Wiktionary, ko.wiktionary.org, and that found on the English Wiktionary, right here. The difference is that the KO Wiktionary aims (in part) to provide English translations for speakers of Korean, while this Wiktionary aims to provide English glosses and usage information on Korean words for speakers of English.
These may seem similar, but in fact there is an enormous difference between the two. If you have ever compared a K-E dictionary made for Korean speakers with one made for English speakers, you will understand this.
  • If you want to provide translations of Korean words in multiple languages, you can only do so on the KO Wiktionary. English words are the hub of the English Wiktionary; Korean words are the hub of the Korean Wiktionary. Cf. ko:묶다.
  • If you want to assist Korean-speaking students of foreign languages, you will only reach your target audience through the KO Wiktionary.
  • On the other hand, if you want to provide information for English-speaking students of the Korean language, you should contribute here on the EN Wiktionary.
The KO Wiktionary is currently quite neglected. Nonetheless it is, in principle, the equal of this project; eventually it should contain as much information as any other Wiktionary. However, it cannot do so without the help of native Korean speakers like yourself. -- Visviva 14:14, 17 May 2008 (UTC)
KYPark, you have now just admitted to everyone that you have no way of verifying your claim. If you cannot verify information, then it doesn't belong on Wiktionary. At Wiktionary, we cannot simply post whatever we want, and assume that others will not challenge it. You should be prepared to defend any edit with solid evidence. Some of my entries have been challenged by non-experts as well. The best way to handle such a situation is to provide proof which is independently verifiable. What proof (book, dictionary, website etc.) can you offer that your above example is not an example of a false cognate? -- A-cai 07:38, 17 May 2008 (UTC)
Oh dear A-cai, again you misunderstand my English. By "A-cai, no, I don't." I meant "I don't mean that the English word mow is somehow related to the Korean word mae-da. This was to respond to your primary question: "Do you mean that the English word mow is somehow related to the Korean word mae-da?" By my answer, I need not answer the next. Watch out your English understanding. Cheers. --KYPark 13:26, 17 May 2008 (UTC)
KYPark, I think this is part of the problem. Your English is difficult to understand. You seem to have an adequate grasp of English vocabulary. However, your English sentence structure needs work. Ok, so if that was not what you meant, then what did you mean? If explaining your argument is too difficult in English, perhaps you could post your explanation in Korean to Visviva, and he can translate it into idiomatic English for the rest of us. I do not wish to embarrass you by suggesting this. I only wish to help you communicate with us. After all, isn't that why we're all here? -- A-cai 13:44, 17 May 2008 (UTC)


== Korean ==

=== Alternative spellings ===

* 띄엿 (ttuiyeos, ttuiyeot) (obsolete) 

=== Noun ===

 띠앗 (ttias, ttiat)

# brotherhood, fraternity, fellowship 
  Cf. Dutch, deutsch, w:Theod, þeod 

  • Is this sort of thing just a laughing stock or witchcraft?
  • Isn't it a great fun and run that Koreans may enjoy?
  • Is it doing any harm to what or whom, as if a fraud?

As I said, Visviva, nothing but English Wiktionary is just great to anyone. It is not in this case that "anything goes." There is a royal road in learning, say, English in English! This is partly why young Koreans spend so much money in English-speaking countries. They should better or more use the English-English dictionary, say, English Wiktionary or Merriam-Webster, than the English-Korean except in the beginning. This is partly why I would not accept your advice for me to go to KO, though it may not stand for Knock Out. Definitely no thanks anyway. But let me argue this way instead.

The international language or lingua franca shifts from language to language. English is the currency of which native English speakers take great advantage. Yet it is not their community's monopoly. Simply their national and international languages are the same. Everybody's language is nobody's language. English as such, e.g., en.wiktionary, should remain a universal melting pot which should include the Korean nativity per se together with the others. An objective, accidental, factual, neutral, unvocal, uncommitted, uncrowned, unaffected, undeniable, unalterable, unassailable simple comparison of Korean mae-da with Dutch maai-en should not be excluded from en.wiktionary. I do hate such nationalists as create a myth to brainwash their people as if they had been specially created by their God. I would never do such ridiculous evil.

Linguistics could become a science as far as subscribing to scientific methods, empirical and rational, inductive and deductive. Theories or hypotheses are rather rational and deductive by definition. So are Indo-European, Eurasiatic, Uralic, Altaic, Ural-Altaic, and so on. Neither is either historically or archaeologically proved. Either is no more than hypotheses, each aiming for better explanation than other. The Altaic hypothesis takes Korean as Altaic, while some others take it as an island not without reason. Suppose Korean shares far more cognates with European than with the mainstream Altaic. Then the Eurasiatic would best explain this fact, however minor. Every hypothesis has its own use. Even the Ural-Altaic should not be wholly denied. "Anything goes," according to w:Paul Feyerabend (1975). To treat it as rubbish is to degenerate linguistics into a lesser science. The more claim for community opinion, the more degeneration into the lesser science.

The "normal science" performed within a community called "paradigm," as noted by the science sociologist w:Thomas Kuhn (1962), is not quite scientific but quite socio-polytical. I would rather call it scientific pathology. In effect, if not on purpose, he stirred up scientists to the wrong, unscientic, polytical direction. His notion of normal science is abnormal science in an ideal sense. Scientists have the reason to prefer pragmatic interests, personal and communal, or pursuit of happiness to pursuit of truth. (In Korean parlance, 염불보다 잿밥 (yeombul-boda jaeppap) literally means that the mass service of Buddhism is valued less than the mess served to Buddha.) They used to wage w:science wars such as w:Creationism vs. w:Darwinism. The former fights for the Christian community, while the latter for the Darwinist.

In a sense, science and the w:enlightenment movement emerged and evolved in reaction to the clergic community called church. (Note that the church is no more than a community, whatever absolute claims it may make. Any church would make such claims.) At least, the movement, if not science too, became highly polytical, culminating in w:French Revolution. The English industrial, the French polytical, and the German religious revolutions share the same thing, that is, rebellion against the Christian church.

To know Korean truly is to know its greatness underlying. Unfortunately, Koreans seem to know little about that, I fear. It has undergone lesser changes. Its syntax and vocabulary is well organized. In contrast, English has undergone greater changes, hence a highly corrupted or eccentric version of the Germanic and European. It almost does without the European inflection. The /-en/ ending of Germanic verbs gave way to the root and to-infinitive. OE "mawan" now sounds "mow." Everything has been simplified, if not corrupted, but for the vocabulary of the greatest mixture. It may have changed so as to be used by a great mixture of ethnicity such as w:Huns, especially of Altaic origin, perhaps from the Far East!

The mysterious name Hun may have been simply derived from the Chinese hun () meaning (1) mixture (2) western barbarians or "西戎混夷". Anglo-Saxons may have risen from Scythia or w:Khazaria afterwards around the Caspian Sea, aka Bahr-e-Qazvin meaning "Khazar Sea" in ancient Arabic. Then it would be a great "laughing stock" for the British who may be more Hunnish to make fun of Germans as Huns or Sauerkrauts, (that is, most similar to one of the best Korean trademarks, kimchi 김치, together with the millenia-old caviar to Korean cet in Yale).

Such English eccentricity could never be explained by the helpless I-E hypothesis but hopefully by the Eurasiatic. Strangely, the Far Western English language has been easternized in effect, if not in fact. I can hear all sounds spoken in King's English, say, by Queen Elizabeth, as clearly as Korean.

Islands on the surface rarely float like an iceberg but mostly connect with the land mass below the surface. Such may be Korean. If it dubiously or hardly connects with the surrounding Altaic on the surface, it may do more closely with the other mass below the surface. Strangely it rarely shares cognates with the Altaic neighbors, in spite of syntactic similarity. Thus linguists regard Korean as an island. But the SOV syntax does not surely warrant the linguistic neighborhood. The older Latin also used SOV, which may have been more prevailing two millenia ago.

Korean is a great mystery as well as a great heritage. Such is the case with Korean Scythian-like clothings, fermented food in variety, floor heating, unbeatable archery and hand work, Amazon-like tough women (millenia-old iron headgears for women were excavated), half of the world's dolmens, and so on. --KYPark 06:28, 19 May 2008 (UTC)

KYPark, you do well to emphasize the point that is (or at least should be) a science, not a democracy. And it is quite true that science should be open to any theory, tested on evidence and not whether people like it or not. However, I think what you may be confusing is whether we're doing science here. The fact is, we're not. We are not coming up with, testing, and debating theories here. That is not the purpose of Wiktionary. What we are doing is simply reflecting academic consensus. Thus, we are not at liberty to come up with interesting thoughts and propose them to our readership. We only copy what the experts have already figured out. If you would like to try and argue for a Korean-PIE relationship, you are certainly welcome to do so. But do it in linguistics journals, not here. This is not an academic forum. It is an academic reflection. -Atelaes λάλει ἐμοί 06:57, 19 May 2008 (UTC)
Atelaes, I was arguing that things like academic consensus are very very rare, but ever-diverging points of view or community opinions in disguise of consensus. For example, there can be no consensus between creationism and evolutionism in parallel for ever, between Judaism, Christianity, and Islamism within Abrahamism, between unaccountable sects of Protestantism. Such is science carried on by ever-diverging competing conflicting paradigms or academic communities, as anything goes! or as if Thomas Kuhn had taught scientists to behave divergingly like religion rather than convergingly! Religion and science are a firm belief system. These are too dirty for me, hence none of my business. I have no intention whatsoever to promote or prove the Eurasiatic hypothesis, Euro-Korean hypothesis, or the like. But I would like to share and communicate the objective facts I know about Korean. These are mere data perhaps to be evaluated by scientists from theory to theory. But I don't bother them but the general readers. To help simply compare Korean mani-da with French mani-er is not science at all, but to insist both are cognate surely is. The question of fuzzy boundary is very crucial in science, so easily leading to category mistakes. It is a foolish category mistake to insist my help is science. A tour guide to show us around is not a scientist at all! A Shakespeare cannot become a Newton at will. (my parody in reverse order) But he helps us open our eyes wide to see what the world looks like. Please try to get to my point. You may ignore all my argument mainly aiming to draw attention to Korean, but evaluate the example in the boxes on top of it and answer the three questions below the boxes. Thanks. --KYPark 13:26, 19 May 2008 (UTC)
I've already told you: compare Korean mani-da to the progeny of Latin manus on your blog or personal web pages, but not on the definition lines of either here on Wiktionary, for that would be masquerading supposed IE-Korean genetic relationship based on nothing but vague sound similarities. Mnemonics argument is also not applicable to mainspace (it's usefulness is debatable even in separate appendix), as you've been told.
About your "anything goes" and "science as a belief system" claims - you're barking up the wrong tree. We don't want proofs or new theories or invalidating the old ones (which you've hardly done in your lengthy rant), but cites supporting mainstream theories established by professionalists. To what extent are they wrong - it's not our problem. --Ivan Štambuk 16:31, 19 May 2008 (UTC)
  • I expected Visviva to respond to my long argument mainly intended with him. But he did not.
  • Instead, Atelaes responded, mainly arguing for "academic consensus" as the sole source of reflection on Wiktionary. I disagreed and advised that things like "academic consensus" is very very rare. And I asked him to ignore all my argument but answer my three questions on top. But he did not.
  • Instead, Ivan responded, mainly repeating his claim elsewhere. In a sense, he answered my last question on top "Is it doing any harm to what or whom, as if a fraud?" ignoring the previous two. His word "masquerade" would be equivalent to my word "fraud" that may do "any harm to what or whom."
  • In effect, he insists I do harm by "fraud" or "masquerading." What is this? Is this what Wiktionary says to me? Please advise me how I can be assured that this is what Wiktionary means. I advise Wiktionary and Wikipedia to answer my question when it is well aware who I am.
  • Meanwhile, I cannot help but describe this is gaepan (개판) in short in Korean. No more witch-hunt please. Thanks. --KYPark 15:18, 20 May 2008 (UTC)
Nobody wants to respond to your long "argument". I'm personally just waiting for this discussion to end and for you to accept that this sort of speculative original research does not belong on our defintion pages. It think it's blindingly obvious that concensus is against what you're trying to do. Mike Dillon 15:41, 20 May 2008 (UTC)
KYPark, I will attempt to give you a short answer. Wiktionary and Wikipedia have a policy of no original research. Your argument is that you should be allowed to enter original research into Wiktionary, because there is no academic consensus on the subject. So far you have given us your personal opinion. You have not cited a single source that supports your claim. I find your argument about creationism vs. evolution to be disingenuous, because there is academic consensus among scientists on that subject. You ask us why it is harmful to enter unverified information into Wiktionary. The reason that it is harmful is that it affects our credibility as a dictionary. If Wiktionary cannot point to a reliable source for its information, then nobody will take Wiktionary seriously. We want people to take Wiktionary seriously, because we want Wiktionary to survive and thrive. -- A-cai 12:42, 21 May 2008 (UTC)
It is a shame that one often defeats oneself, especially without knowing the fact. It is unclear if to let her know that is to do her harm. So some just suck their cheeks or stick their tongue in cheek. But I would not make fun of her behind her back, but tell her the truth, which in itself is neutral, but could sound cruel to liars and obscurantists.
Suppose w:Darwinism or w:evolutionism is a thesis of academic consensus. Then should its antithesis such as w:creationism or w:intelligent design be deleted from Wikipedia and Wiktionary? Should w:Lamarckism be deleted? Should w:Ural-Altaic as rubbish be deleted? In practice, few things die and most things do. Even w:Flat Earthism survives! Wikipedia is very proud of the greatest number of entries. What is the implication of this greatness? Are there that many theses of academic consensus indeed? Oh, no, never, ever!
The idea of "academic consensus" is a huge stumbling block and self-contradiction. Academia in essence is a place for partisanship Kuhn called paradigm, rather than for consensus beyond paradigm. Academic circles are like polytical parties. It is now well accepted that science is not value-neutral. The title The Collapse of the Fact/Value Dichotomy authored by w:Hilary Putnam (2002) is striking. Korean linguistic facts, for example, should be of more value to Korean general public above all than scholars, education than science.
Christians see Muslims evil, and vice versa, namely, proto-religious war endangering the peaceful excluded middle. A practical solution would be peaceful co-existence of black and white, say, Muslims and Christians. Wikipedia where anyone does and anything goes in general is the last place for black-and-white judgment and choice, but the ever-lasting place or melting pot for black-and-white confusion. All it could and should do is to inform readers of both black and white for their own judgment and choice, namely, w:reader response. --KYPark 14:37, 22 May 2008 (UTC)
You are missing the point, Wikipedia articles and Wiktionary entries do not deal in original research. The articles and entries relating to Darwinism and those relating to evolutionism, as well as all the others you mention, reference respected reliable third-party sources. Your etymological additions are not backed up by any third-party sources and so are not accepted here. There are plenty of sites on the internet that operate with an "anything goes" philosophy, but Wiktionary is not one of them. The governing philosophy here is reliability and verifiability, which means that everything must be sourced and referenced with reliable sources. If it cannot be reliably sourced then it must be deleted, regardless of the importance or otherwise of the word/topic/goal/etc. Thryduulf 15:23, 22 May 2008 (UTC)
The admin community is supposed to be brainwashing me and the third party by repeating again and again as if I had imposed my original research on Euro-Korean etymology on Wiktionary. That is, it seems to be unjustifiably harassing me. I take this likely offence very seriously. My contribution must have included a negligible amount of original research if any. I cannot show up the whole state of affairs, as what I had done was mostly destroyed by the community one-sidedly. Instead, I recently brought a few new examples to attention, and asked how problematic such would be. The ever-changing answerers have rather avoided answering my questions directly while repeating their one-sidedly assumed claims in other words to the brainwashing effect. From those examples and another new page 고인돌 (as originally edited by me), you should discuss very persuasively which parts are definitely an original research and why. Otherwise, you are in effect harassing me without enough evidence beyond the reasonable doubt. You would know perhaps better than me what could be the possible consequence of such repeated false charge and evil harassment. --KYPark 08:45, 23 May 2008 (UTC)
Either you are trying to present serious etymological relationships, in which case you need serious etymological sources, or you are simply adding accidental similarities between words in a handful of the 7,000+ languages we try to cover, because you think readers will find them interesting or motivating. (You have appeared to make both claims in this discussion.) In the first case, the claim must be verified; in the second case, this is simply indiscriminate trivia, which we do not welcome here. -- Visviva 09:50, 23 May 2008 (UTC)
You show me two choices. But you are well aware that I deny the first is my choice. So you actually allow me just one (second) choice and dictate it based on your subjective evaluation, without discussing "very persuasively." OK, anyway. But, as you are not supposed to be the wiki law-giver, please convince me that all my above examples plus all my edit on the page 고인돌 are useless and undebatable enough to be entirely ignorable and deletable (even without prior discussion with the original editor), and that your remark is the final, non-negotiable wiki policy. Thanks. --KYPark 15:17, 23 May 2008 (UTC)
What do you mean by "share the same Roman syllable" ? What, if any, usefullness there is in comparing Korean dol and English *dol < Breton teol (which means table not stone) appearing exclusively inside the adopted compound term dolmen ? Looks to me that you're again trying to masquarede genetic relationship based on vague phoentic correspondence. --Ivan Štambuk 11:51, 23 May 2008 (UTC)
``Korean goindol and English dolmen share the same Roman syllable /dol/ by accident, meaning "stone" and "table" respectively.`` Ivan, you are very irresponsible to answer me without properly understanding the above single self-evident sentence. So are most others. So are most witch-hunters in Western history. So I called this talk the twisted or wicked witch-hunt party loud and clear, so convincingly from the beginning. So I blame you all for blaming me unjustifiably, without enough understanding and evidence beyond the reasonable doubt. At least on this occasion, you mistook my word and harassed me. I wonder if you are brave enough to apologize for this, and again to look for my weakest link you have to attack. Cheers. --KYPark 15:17, 23 May 2008 (UTC)

Response to KYPark

KYPark, you invited us to explain which parts of 고인돌 contain original research. I don't speak Korean, but I will attempt to give you an answer. First, let us look at your definition:
===Noun===
{{ko-noun|rv=goindol}}
  1. A dolmen, a prehistoric megalith having a capstone supported by two or more upright stones.
The above does not constitute original research, and is easily verifiable. However, you should include a references section to show where the information comes from. Here is how I would do it in this case:
===References===
*{{pedialite|고인돌|lang=ko}}
*{{pedialite|Dolmen}}
The {{pedialite}} template will give you the following text:
The above is convincing evidence that the Korean word 고인돌 equates to the English word dolmen.
Now for the second part, your etymology says:
===Etymology===
From 고인 (goin, “supported”), adnominal form of 고이다 (goida, “to support”) + (dol, “stone”). Korean goindol and English dolmen share the same Roman syllable /dol/ by accident, meaning "stone" and "table" respectively.
Where did you find the information in the etymology section? Did it come from a dictionary? Did it come from a book? Did it come from a website? Did it come from an academic journal? We don't know where you got the information from, because you don't state that in a references section. If you cannot point to a dictionary, book, website, academic journal or other reliable document as the source of your information, then we are free to assume that it is your own personal opinion. If it is your own personal opinion (even if your opinion turns out to be correct), it is considered to be original research, and is not allowed on Wiktionary.
Does the above answer your question? -- A-cai 12:01, 23 May 2008 (UTC)
KYPark, one more thing. The purpose of the etymology section is to explain the origin of the word 고인돌. With respect to the second sentence:
Korean goindol and English dolmen share the same Roman syllable /dol/ by accident, meaning "stone" and "table" respectively.
The second sentence does not explain the origin of the word 고인돌. Therefore, it should not be included in the etymology section (whether it is original research or not). -- A-cai 12:17, 23 May 2008 (UTC)
KYPark, take a look at 刻舟求劍. Notice how I provide a source for each piece of information. -- A-cai 12:24, 23 May 2008 (UTC)
Again and again and again, you mistook my word and harass me! But I will help you understand me properly. First you need to go to the history file of 고인돌 I created today. There were great edit wars today, presumably without your knowledge. The most important admins visited and edited against my edit. Have you done any? Oh no forget it, but examine carefully the historical processes, and sort out what is my real contribution. Really I did not want this sort of confusion, and asked my original edit to remain as such for a week. Nonetheless, my edit was immediately destroyed perhaps to your dismay. But forget all these, but just remember that you have to answer me at all after you have mastered the whole history of this god-damned page! Understood? Many Thanks. --KYPark 15:36, 23 May 2008 (UTC)
Hurriedly, just one more thing. Read the talk page, too. Thanks again. Sincerely yours, --KYPark 15:43, 23 May 2008 (UTC)
KYPark, I now see that the following part was deleted:
  • "Dolmen" originates from the expression taol maen, which means "stone table" in Breton. (Beside this Wikipedia article: Note that this Bretonic word was allegedly incorrectly fabricated so that taol stood for "table" and maen for "stone." Also note an assumed Sino-Korean word consisting of (dol, "stone") and (Japanese men, Korean myeon, "roof").)
  • The etymology of the German Hünenbett or Hünengrab and Dutch Hunebed (lit. Huns' bed) all evoke the image of giants building the structures. Of other Celtic languages, "cromlech" derives from Welsh and "quoit" is commonly used in Cornwall. Anta is the term used in Portugal, and dös in Sweden.
KYPark, the above was deleted because it is not directly related to the origin, definition or usage of the word 고인돌. Wiktionary guidelines are fairly clear about what kinds of information can be included in an entry (see: Wiktionary:Entry layout explained).
Finally, please read the following Wiktionary policy pages: Wiktionary:No personal attacks and Wiktionary:Assume good faith. -- A-cai 20:23, 23 May 2008 (UTC)

I dare to declare I won

  • If not, just talk why not. --KYPark 17:34, 23 May 2008 (UTC)
  • It was 15:17, 23 May 2008 (UTC) I first began to count. --KYPark 17:46, 23 May 2008 (UTC)

Do you want permission to add etymologically unrelated words to the etymology section of Korean entries? If so, then a simple "no, that's not the purpose of the etymology section, but feel free to use the talk pages for such trivia" seems sufficient. If you want something else, please be specific. Rod (A. Smith) 18:01, 23 May 2008 (UTC)

I still sit up at 5:20 local time. --KYPark 20:23, 23 May 2008 (UTC)
Yeah, I read you, A-cai. Thanks always. --KYPark 20:35, 23 May 2008 (UTC)
May I go to bed at 6:04 local time? Good night everybody... --KYPark 21:00, 23 May 2008 (UTC)

Wiktionary:sysop-Deleted

This page needs a complete overhaul for several reasons.

  • When viewing a page that has been deleted, you are shown the message "Note that administrator comments older than one year may be inaccurate, as explained in Deletions" (which redirects to Wiktionary:sysop-Deleted). However when you get there, you find there is no explanation of this.
    The history of the page that generates this message, MediaWiki:Recreate-deleted-warn shows that this was added on the 14th of December 2007. It gives no indication of whether it is older than 1 year from (approximately) that date, or a rolling 1 year period.
  • The content of the page is written in a very harsh manner, seemingly designed to scare people away. We don't want vandals, true, but they are not the only people who will see this.
  • Number 4, for example, basically tells everyone who wants to add a term at a protected title to bugger off to urban dictionary. It mentions nothing about what to do if you know (or even think) you're not entering the same term - it is entirely plausible that in the past some vandal repeatedly recreated something completely bogus, e.g. perhaps at

somewhere like "ogof", such that the entry title was protected. Years later a new contributor comes along and wants to add some words in a language that we don't have many entries for, they happen to start with "ogof" (which means "cave" in Welsh iirc) but are told to go away. There is no instruction to ask anywhere whether their word is allowed or not, not even to look in the list of protologisms. The result is they go away and we lose a valuable resource.

  • Number 3 says "before resubmitting nonsense entries" and then goes on to explain about requiring three citations and not to use secondary sources. We don't want nonsense entries resubmitted, and we want people who don't know about the citations requirements to find out about them easily, neither or which this entry does.
  • There is no explanation for what is meant by:
    • "attack page" - number 1 explains it (sort of) but never using that term.
    • "bad redirect" - there is something about redirects but it starts off by saying we're not Wikipedia, which isn't going to make it easy for anyone looking for what is meant by "bad redirect"
    • "bad entry title" - this is tangentally covered in the "Entire classes of terms are deemed not Wiktionary-worthy." info at number 7, and even less well at number 4 about page titles being protected
    • "copyright violation" - it should be obvious what this means, but there is nothing anywhere on the page (not even a link) about why we can't take copyrighted material
    • "Creative invention or protologism, (use WT:LOP)" - there is no explanation of what we mean by either "creative invention" or "protologism" and why they are deleted. You might follow the link to "WT:LOP", but if you don't it's meaningless jargon ("what is WT:LOP and how do I use it?"). Buried towards the end of number 3 (which waffles about explaining what deletion is for those who don't understand) is the criteria for citations of use. This is also briefly mentioned at the end of the section about classes of words not being allowed, using phobias as an example (and not a brilliant one either).
    • "Failed RFD do not enter" - what is RFD? How do I find the reason RFD deleted this word (or a word with this spelling)? What do I do if it isn't the same word?
    • "Failed RFV do not enter without Valid citations" - what is RFV? The link is to WT:CFI, which does not use the word "citations" in any header so you can't find it in the TOC. In the body text the word "citations" is used only in the sections about "Fictional universes" and "Brand names", which doesn't help the person unfamilar with Wiktionary who is trying to enter a word that is not a brand name and not related to a fictional universe (e.g. ogof again).
    • "Fatuous entry". Possibly covered in the "user tests" section (2) but the word is not used anywhere on the page.
    • "misspelling of" - of what? I've seen other pages that say they are a misspelling of something, why was mine deleted?
    • "Name of a person" - Why is this not allowed? You have entries for first names and surnames?
    • "Previously deleted/failed RFD or RFV" - What is RFD? What is RFV? Why was it previously deleted?
    • "Random formatting" - Why was it wrong? Where do I find how to get the formatting right? I thought this was a wiki and people were meant to cleanup where others got things wrong?
    • "Promotional material" - Why can't I put it here? Where can I put it? Where do I appeal if it isn't promotional material?
  • The page doesn't give links to other useful pages - there are no links to (or explanations of) RFD, the CFI, the Tearoom, Beer Parlour , Grease pit, information desk, etc. There is one link to RFV (in fact this is the only link on the page), but it gives no expansion of the acronym or explanation of the term.
  • Entry 8 is "other minor reasons", which they are, but there is no explanation of what the terms mean.

In short it needs a complete rewrite. Thryduulf 13:20, 14 May 2008 (UTC)

The page communicates more to us ("I had a legitimate rationale rather than mere pique for this deletion.") than to those to whom it is purportedly directed. The negative thrust of the message is the problem, IMHO. Giving the user a positive path to follow might lead to less bad feeling and get us some good content at modest cost in terms of additional entry-review time. Vandalism and silly entries are an inevitable part of Wikidom, I think. We need to cope. It might be a good idea to encourage folks to use requested entries, information desk, feedback, or the talk page for the entry or to direct them to a special-purpose place (proposed-change-space?) where they could enter what they wanted to enter without it trashing principal namespace. DCDuring TALK 15:13, 14 May 2008 (UTC)
I suppose there is a legitimate point to not facilitating the access to a possible vandal or a frustrated would-be contributor to a place where they can be disruptive. A positive path to a place where they can either contribute or vent spleen might allow negative emotion (anger) to dissipate. DCDuring TALK 15:23, 14 May 2008 (UTC)
We have the sandbox though it is very under-advertised. I dislike the appearance of this page in the deletion comment, though I do agree that ensuring we don't actually save personal information or other garbage in the deletion log is a good idea (which is why the PREF replaces the custom deletion comment with this). If someone does want to rewrite this page, then feel free - though maybe at a better title. Conrad.Irwin 19:53, 14 May 2008 (UTC)
I'll take a shot at rewriting this, but it will probably take a few days. As for a better title, how about Wiktionary:Explanation of common deletion summaries? Thryduulf 21:07, 14 May 2008 (UTC)
Wiktionary:Deletion currently redirects to sysop-Deleted, maybe we could use that (as it's slightly shorter ;)? Conrad.Irwin 21:25, 14 May 2008 (UTC)
I've made a start at Wiktionary:Explanation of common deletion summaries, feel free to move it there if you want. I've only done the introduction and one section so far, both of which need checking, etc. Thryduulf 22:35, 14 May 2008 (UTC)

what is all this fuss at all?

just total nonesense. you should know. why? —This unsigned comment was added by KYPark (talkcontribs) at 10:59, 14 May 2008.

If this is to do with Atelaes' thread above, please comment there. Otherwise I have no idea what you are talking about. Conrad.Irwin 19:56, 14 May 2008 (UTC)

Wiktionary:English pronunciation key

Wiktionary:English pronunciation key

I've made a number of changes to the pronun. key, which you may feel free to disagree with:

  1. removed erroneous information
  2. removed distinction between monophthongs and diphthongs listing all vowels under one table. Why? (a) because we dont need to teach articulation to anyone, we just need to represent sounds, (b) this is usually the tact used in dictionaries
  3. reordered sounds to approximate English alpha order (instead of the previous articulatory ordering). Why? (a) because this makes it easier to find the symbol for the naïve reader (who knows nothing of articulation), (b) this is standard practice in dictionaries
  4. I removed subphonemic distinctions (like the flap and velarized L). Why? (a) these pronunciations are entirely predictable by regular phonological rule, (b) dictionaries generally do not indicate nonphonemic information
  5. I added Vowel + /r/ distinctions that dialectically naïve wiktionary editors may not be aware of. These are usually implicit in dictionaries, but I suggest that we explicitly mark their differences due to nonprofessional editors:
    1. The difference between Mary, merry, marry
    2. The difference between serious and Sirius
    3. The difference between hoarse and horse. This one is perhaps debatable since this distinction is being lost in many standard dialects due to language change. However, these two sounds are usually distinguished American dictionaries and in the 2nd edition of OED (however, the new online edition has changed to treat them as the same, ignoring the old folks and minority RP speakers). (If wiktionary wants to ignore this distinction, then we need to revert my edit to hoarse.)

Discuss? Ishwar 13:40, 15 May 2008 (UTC)

On a quick look through the changes, I didn't see any problems, although a note at ʍ would be a good idea, since the distinction exists only a a portion of the range where English is spoken. And we may want to keep the distinction between the flap and velarized L. I'll wait to see what others think before copmmenting any further. --EncycloPetey 17:52, 15 May 2008 (UTC)
ʍ belongs next to w, since the English is spelled the same and the distinction is easier to find and understand when they are next to each other. Notes should be in the same column as examples, rather than the footnote in /ɹ/. Why are there separate entries for the identical vowels for /æ/ær/ and /ɪ/ɪr/ (wouldn't the /r/ be /ɹ/ in GA anyway)? Other symbols may want to have examples if they don't get cluttered (because it's a bit hard to judge the vertical position of the symbols in isolation), which may also help replace the note describing the use of stress mark.
Looks like an overall improvement. Thanks. Michael Z. 2008-05-15 19:27 z
I didnt change the footnoting.
Yes, ɹ should > r if you're using ɹ.
The separate entries for ăr and ĭr are to explicitly indicate to American editors that ăr ≠ âr ≠ ĕr and that ĭr ≠ îr. These sequences have mergered in many standard American dialects. This is in fact somewhat redundant and usually kept implicit in dictionary pronunciation guides (although you can find this out in the body of the dictionary by comparing Mary (mâr), merry (mĕr), marry (măr) and myriad (mĭr) , Sumerian (mîr). Plus, the guide already had the redundant är which is the sequence ä + r. (On a related note, you could eliminate îr altogether by symbolizing it as ēr, which is the way Random House Unabridged does it: myriad (mĭr) , Sumerian (mēr).)
Suggestion for "Other Symbols" is so noted. Ishwar 20:07, 15 May 2008 (UTC)
  • Basically good I would say. Except I think we should ditch RP and just call it UK, as per the OED and Wiktionary talk:Pronunciation. Widsith 19:38, 15 May 2008 (UTC)
    • I agree that we shouldn't treat RP as "proper". But should we separate RP and standard UK, or just expunge RP from Wiktionary? It may be that pronunciations for many terms were transcribed from RP so it would be wrong to relabel them en masse. Perhaps it would be valuable to keep both RP and modern UK transcriptions for historical/research purposes. This looks like a broader discussion. Michael Z. 2008-05-15 19:47 z
      It is a difficult issue. I add a lot of pronunciations, and always want to provide some clue about UK pronunciations. However, my experience is largely with the RP and very little with other UK dialects. This is primarily a result of the "posh" acccents promoted by the BBC, which has been my primary source of information about British pronunciation over the years. I have neither a good ear for nor sense of the sounds used in other UK accents. --EncycloPetey 23:27, 15 May 2008 (UTC)
      The problem is, what you're hearing is probably not really RP. For example, even the poshest BBC newsreader does not say [æ] anymore, but [a]. Phonemes like [æ] and [r] have essentially disappeared from "standard UK" speech. [æ] is particularly interesting one, since it is a major part of US English, and is in fact one of the primary differences in accent, which is why a word like man sounds very different in London and New York – yet RP transcription makes them look the same. Widsith 14:18, 16 May 2008 (UTC)
      Yes, what I hear now on the BBC is not RP, and I can tell. I'm referring to programs recorded in the 1960s and 1970s that I grew up on, and which even now get occasional airtime. Mostly comedies, scifi, nature, and adventure programs. I also have a DVD collection of those Shakespeare productions done in the 1970s. --EncycloPetey 14:32, 18 May 2008 (UTC)
      Ah, yes – that'll do it. Widsith 14:52, 18 May 2008 (UTC)
      Maybe I was overreacting. I'm okay with ditching RP in the chart, as long as we have a reasonably clear definition of what the UK accent is, and take into consideration other dictionaries' practices. Perhaps no one was suggesting ditching or renaming RP pronunciation in Wiktionary entries. Michael Z. 2008-05-15 23:30 z

pronunciations

I've extracted all the pronunciation info from American Heritage online for my own personal fun. I can convert them to the wiktionary pronunciation guide. Anyone interesting in using this info? If so, tell me where to upload it. Ishwar 13:43, 15 May 2008 (UTC)

What copyright restrictions are there? It would be amazing if we are allowed to. Conrad.Irwin 16:33, 15 May 2008 (UTC)
Please don't upload information copied directly from copyrighted sources. Perhaps the audible pronunciation of a word is an uncopyrightable fact. But its phonetic transcription is the creative product of a skilled expert (it isn't deterministic, incorporating a lot of nuance, just like dictionary definitions). Michael Z. 2008-05-15 19:10 z
I dont know the copyright issues.
"copied directly" is defined as what?
So, where does the pronunciation info in wiktionary come from? No one looks in reference books for pronunciation information? If pronunciation information is largely the same across dictionaries, can the origin of the information be ascertained and it be copyrighted? Will any wiktionary entry ever have an analysis resulting in a transcription that differs in substance from a transcription in a published dictionary? Ishwar 20:15, 15 May 2008 (UTC)
Dumping an online database and converting the data to Wiktionary is copying. It infringes on the creator's rights under worldwide copyright laws.
Pronunciations in Wiktionary are composed by editors. They may be created with reference to, with interpretation of, or transcribed from various sources (which often differ). Referring to a source is not the same as duplicating its database.
Read Wiktionary:Copyrights#Contributors' rights and obligations. It clearly identifies what you have the rights to licence to Wiktionary by entering it here. Michael Z. 2008-05-15 20:31 z
When I enter pronunciation information, I basically transcribe my pronunciation of the word in question, with reference to the pronunciation key and the examples therein. Occasionally I will look at online references to see how a particular part of a word has been transcribed or where syllable breaks have been put. Having moved around Britain quite a bit (principally Tyneside, North Yorkshire, Somerset, South Wales and now London), I do not have a particularly strong accent but I don't speak RP, hence I label it "UK". Thryduulf 22:43, 15 May 2008 (UTC)

Meta logo letter discussion link

There's a discussion on meta about changing the Japanese letter in the tile logo image. Best regards Rhanyeia 08:09, 16 May 2008 (UTC)

Topline see alsos for common misspellings

One of the most frequent spelling mistakes I make is whether a word has a single or double consonant in the middle. When looking the word up on Wiktionary, if I guess wrongly and the word I have entered is not an entry then the search results often (but not always) will link to the spelling I intended.

However, there is no such indication if the spelling I entered is a different word. For example, I might be intending to look up the English word barrack but misspell it as barack, which is a Hungarian noun.

Should the use of the topline see also therefore be used additionally to link these words together, in addition to any words with differ solely in capitalisation and/or presence or absence of diacritics? Thryduulf 15:44, 17 May 2008 (UTC)

I think that is a good idea. Conrad.Irwin 15:46, 17 May 2008 (UTC)
It can't hurt, but wouldn't it be better to have an actual English header and a "common misspelling" sense? DCDuring TALK 15:50, 17 May 2008 (UTC)
I'm not certain (I've not checked) that all of these would qualify for a "common misspelling" entry. Perhaps it would have been better to use a description like "simple orthographic variations" but then I doubt that anyone would know what I meant. In English words with sounds like IPA(key): /æɹə/ in them can be spelled either "..ara.." or "..arra...", so it makes sense to link to the alternate. However I don't know how to define this finitely, e.g. how many variations for words with the sound IPA(key): [ʃ] in them do we include? What about the orthography of other languages? Thryduulf 23:30, 17 May 2008 (UTC)
It is first of all entirely possible but only if there is a regular rule that does not depend on the language. See also colour for the page color is not acceptable on the top line unless you would want to see the top line to extend for several lines. See also theatre for the page theater is questionable and probably not a good idea in my opinion. See also perro for the page pero has been suggested before, and it is addition that I have made myself on a number of pages, even between different languages. More often though I have removed such suggestions as "zero" on the page 0, which I find to be rather annoying. As far as I'm concerned see also may as well be something that's completely automatic, since if a computer can't make the decision then it more than likely doesn't belong. The exception are glyphs.
The question about misspellings more broadly could probably be handled a different way though. I don't see that as exactly being the primary motivation for such an idea. The problem, if one could choose the words, is confusion. A native speaker is not likely to misspell pero as perro, but a learner could easily confuse them. I personally feel that's a pretty good reason to have doubles (yes of vowels and everything else) but I'm not entirely confident in that because I haven't seen how far it would extend. DAVilla 19:49, 19 May 2008 (UTC)
I have a hard time seeing how that would work, since (as noted in the discussion) the "misspelling" may not be common and may be a word in a completely different langauge. Would this then be extended to cover cases of single/double vowels? other similar spellings? We already have some members of the community concerned that our "see also" includes too much (although I think the current coverage is just about right). I tend to favor the stats quo on this particular issue. --EncycloPetey 23:58, 17 May 2008 (UTC)
Perhaps what we should have is a "Not the word you were looking for? See a list of similar words" With that being a list broken down by language and containing words with similar spellings in addition to the different capitalisations, diacritics, leading/training -, homophones, etc. If a word with a similar spelling exists in more than one language then it would get an entry in each language section. I don't know how workable this is though. Thryduulf 09:38, 18 May 2008 (UTC)
Search needs improvement. The "see" template, misspellings, and orthographic variant entries incompletely fill the need for a search that handles a fuller range of typos and other user errors. Also restricting search to a user-specific set of languages would be a help to users. DCDuring TALK 13:36, 18 May 2008 (UTC)
We could use a combination of aspell and DidYouMean for this, though I don't currently have the time to play around with them. (For those who haven't tried the WT:PREF "(Experimental) Use the aspell checker on User:Amgine's http:..." that will give you a limited idea of what it is capable of). Essentially DidYouMean will normalise all letters by removing diacritics etc. while aspell uses Soundex and typo checking to work like a normal spellchecker-suggester (in multiple languages). I've had only limited success in getting DidYouMean working on http://devtionary.info/wiki/, though more success with aspell - but it really needs a couple of days of PHP coding to get both of them going again properly. There are limitations though, aspell can only do one language at a time (because each language has different Soundex rules) and running every word through the ~80 dictionaries that it has would take too long, so we'd have to guess the languages to try in advance. If anyone has ideas on how to do this I'd be very interested. If/when we get these two installed there are a couple of other things I'd like to look into, such as de-romanisation of search input (so that searching for luw would also match λύω) and something similar for pinyin. (Though these are long term ideas while I would like to get DidYouMean/aspell going reasonably soon). Conrad.Irwin 13:50, 18 May 2008 (UTC)
I can't wait. :-)   Wiktionary:Feedback suggests that one of our biggest problems is that users don't know how to spell anything. (It's not our absolute biggest problem, which is that users don't know the difference between Wiktionary and Wikipedia, but still, it's a biggie.) —RuakhTALK 15:30, 18 May 2008 (UTC)
Regarding the 80+ dictionaries, there is a useful tool at http://www.xrce.xerox.com/competencies/content-analysis/tools/guesser that identifies which language a phrase is in (in my experience with about 90% accuracy). Obviously this will be harder with single words, and I've also no idea how the tool works or what its license is. However perhaps the best way to work with soundex is to use a tool like this to suggest the likely languages and give a probability based order to feed soundex dictionaries. Thryduulf 15:56, 18 May 2008 (UTC)
*tries it out* It's very impressive (and very cool!), but it doesn't seem to do decently even with correctly-spelled single English words, and it seems to do (very slightly) worse when they're misspelled. (I tried misspellings that I know from experience, and from Google, to be common.) —RuakhTALK 17:24, 18 May 2008 (UTC)
yes, interesting, but relies on having several words in some way: "wadudu" is Breton, and "wadudu wabili" is Malay? Seems to me that is two bugs. (well of course is ... ;-) "mdudu" is Turkish. as is "mdudu engine" (another bug ;-). Okay, I'm having too much fun. It isn't looking for the individual words. Robert Ullmann 17:47, 18 May 2008 (UTC)

Link prefix for pedia-links: w: vs. wikipedia:

I'd like to suggest that in reader-facing pages (entries, appendices, etc.), we use the link prefix wikipedia: rather than its short form w:. This is because the HTML title attribute (which defines the tooltip text — the text in the little yellow box that you get when you hover over something in out-of-the-box Windows, and the corresponding text in other systems) doesn't do the mapping. That is:

It's not a huge deal, but I think it would nice if our tooltip text actually said "wikipedia" in full.

RuakhTALK 18:46, 17 May 2008 (UTC)

So, does silence mean that people agree? Disagree? Don't give a darn? Don't leave me hanging here, folks! :-P —RuakhTALK 16:02, 18 May 2008 (UTC)
We use w: everywhere, and routinely correct wikipedia: (and Wikipedia:) to w:. I for one don't care about "tooltips" (which this isn't, it is just the link title), I look at the URL target if I want to see where I would be going. (This applies to every site, not just here) What some site designer wants me to read is less important. If you think WM software should be putting something more informative in the title attribute, that should go to bugzilla. (might break who knows what ...) Robert Ullmann 16:36, 18 May 2008 (UTC)
I agree with Robert. We routinely use w:, and Wikipedia routinely uses wikt: (Wikipedia even has a policy on that). If there is a problem, it's at the Wikimedia level, not at the individual project level. --EncycloPetey 16:51, 18 May 2008 (UTC)
O.K. Thanks, both of you. :-)   —RuakhTALK 17:11, 18 May 2008 (UTC)
On the other hand, it would be nice to have the tool tips corrected. Even if we use w:foo the tip should display as "Wikipedia: foo", no? DAVilla 21:51, 19 May 2008 (UTC)
That's what http://bugzilla.wikimedia.org/ is for. Though they tend to be very slow in answering enhancement requests. Conrad.Irwin 12:11, 20 May 2008 (UTC)

Offensive usernames and how to handle them

Following discussions on User talk:Connel MacKenzie regarding the usernames "Teh Rote" [5] and "Lou Crazy" [6] I think it would benefit the community to have an agreed procedure for dealing with usernames that are or may be offensive. To that end I propose the following:

If you see a username you find offensive or is inappropriate in another way, you must follow the following procedure:

  1. Initiate a discussion at Wiktionary:Beer parlour about the username, explicitly stating why it is inappropriate.
    • If a previous discussion has occurred, you must either abide by that outcome or explicitly state what has changed.
  2. Inform the user concerned on their Wiktionary talk page (and optionally via any other communication methods) that the discussion is taking place.
  3. Wait for consensus to arise about whether the username is acceptable or not, and abide by that consensus.
  4. If there is consensus that the username is unacceptable, the user should be encouraged to seek a renaming to a username that is acceptable. Only if after a reasonable period they have not requested a renaming may they be blocked and/or a forced renaming sought. Any period between having requested a name change in the appropriate place and this change being carried out does not count against the user.

In all cases, the user must not be blocked prior to their being consensus to do so, except where the block is unrelated to their username. Where the username is very obviously inappropriate such consensus will be arrived at quickly, so there is no need to preempt it. Where consensus does not arrive quickly, the username in question is not "obviously inappropriate".

In deciding whether a username is acceptable or not, it is appropriate to take into consideration the contributions and behavior of the user on the English Wiktionary and/or any other Wikimedia wikis the user has been active on.

Thryduulf 15:38, 18 May 2008 (UTC)

No good; way too much process. The 100% case is usernames that should be blocked instantly, out of hand. The problem here is simply that "Lou Crazy" and "Teh Rote" are simply not offensive, and should never have been blocked, or even considered to be blocked. And that is just an issue to take up with CM, as it has been. (And previously, when he blocked User:Dustsucker, and I unblocked, it is a literal translation of Staubsauger ;-) In the cases where the above might apply, the username should simply not be blocked; the user could be asked to change the name to something else. Robert Ullmann 15:48, 18 May 2008 (UTC)
+1 —RuakhTALK 15:57, 18 May 2008 (UTC)
Totally over the top. I shall continue to shoot first and ask questions later. (any usernames implying continual drunkeness will get the harshest treatment). SemperBlotto 15:51, 18 May 2008 (UTC)
Typical of naysayers. I can't help but chuckle at where this is coming from. Any username with leet in it, indeed, still should be indef'd. This is a web resource dedicate to compiling information about the English language, which leet still is not part of. Those who which to advocate it, are positively pushing a POV that honestly has no bearing on the English language itself. But all that aside, when one sysop suggests that the block reason itself isn't good enough and unblocks another sysops's block (note that there invariably are additional factors) they probably should not be sysops. --Connel MacKenzie 16:02, 19 May 2008 (UTC)
Yes, this is a resource for English. English only? No, every word in every language. (Okay, some constructed languages are not included because they aren't actually used to communicate, but) if people use leet, then it should be included. And for a few words, it is. It doesn't slip through the cracks because you think it isn't English. If it's attestable and it isnt' English, then pray tell, what language is it? What language are ain't and alright and amscray?
Yes, we all know it's silliness, but what is offensive about silliness? To some individuals such a username may not give a preferable impression, sure. But do you count it as a warning to check a person's contributions, or do you count it as a first black mark? Would you consider me, for instance, to be a silly person or a serious person? And does it change who I am because my username is DAVilla or ∂ανίΠα? Believe what you may, I submit that it does not any more than a person can hide his or her true personality. The truth is, somehow that always comes across.
Connel, if you think Thryduulf shouldn't be sysop, you might make a better case by giving an example of an unblock that wasn't justifiable. I have to agree though that in general such actions should be taken very cautiously. I'm assuming, of course, that you would be willing to unblock the person yourself if the issue came to your attention. This is a good lesson though. It's not good to have admins taking unilateral actions, on either side, when there is question. DAVilla 19:22, 19 May 2008 (UTC)
I was the unblocker, so I assume Connel refers to me. I decided to unblock because I felt that the block was incorrect and harmful to Wiktionary, in that it risked us loosing yet another contributor. Had the block been given when the user initially signed up I would not have revoked it, as that is unlikely to cause particular insult, however I feel that such a block should not be given after the user has become entrenched. (Call me flighty if you want, this is just what I think). Before unblocking I checked with a few others on IRC at the time that this was acceptable, and it seemed to be so, however I am not placing the blame on these people as I would have unblocked anyway had no-one been around. Conrad.Irwin 19:51, 19 May 2008 (UTC)
(off topic) In a more general sense, I disagree with the "Wheel War" rule. It is true that admin actions rarely require undoing, as they tend to know what they are doing, they are our most experienced contributors afterall. However in some cases they do need to be undone, and because adminship should not be a "special" thing (I reckon that the majority of Wiktionary regulars are admins - though I notice that WT:VOTE lies still recently), there should be nothing special about the admins or their actions. Conrad.Irwin 19:51, 19 May 2008 (UTC)
Although I think "Lou Crazy" and "Teh Rote" are acceptable usernames, I think that requiring an admin to submit his proposed block for community approval is too much red tape; also, if he was given the sysop flag, he is presumably trustworthy to block appropriately; also, the block can always be removed if needed. I do think that the text that a blocked user sees should inform him how long the block is in place, and that other admins might remove it if they think it was placed inappropriately. (This might prevent unfairly blocked users from leaving Wiktionary forever, when they might have good contributions to make.)—msh210 17:57, 19 May 2008 (UTC)
Red tape: bad; blocking user names indefinitely: bad; leet: not so bad.
Real problem: Wiktionary needs more users and contributors. Wiktionary not is nearly as popular as MW online or Answers.com. If I were trying to ensure that Wiktionary be no long-term competition to the other on-line dictionaries, what would I do? I would do what I could to discourage an active, broad user community such as Wikipedia has, by aggressively and opaquely policing new users in any way that I could. Delete their entries (preferrably without explanation), block them, harass them, provide little welcome or help. Discourage entries of popular terms, especially those that appealed to the young. Try to make it more like the OED than Wikipedia. This would be vastly more successful than having no Wiktionary, because it would actually prevent the "Wiki" brand from being used to establish a real competitor. DCDuring TALK 18:20, 19 May 2008 (UTC)
You paint a rather rosy picture of Wikipedia. Keep in mind that Wikipedia does many silly and irritating things as well. Discussion happens on every individual article on the site. Voting that affects hundreds of pages and several wikiprojects happens without notification in obscure locations, to be concluded rapidly before more than a tiny fraction of uninformed voting rats chew over the issue. Lots of sockpuppetry. Wikilawyering. And, my personal peeve, templates plastered mindlessly onto every single page that lacks a reference. At least here, the obvious or easily checked items are mostly left alone. On Wikipedia, even a one sentence stub will get a massive template demanding references and dwarfing the poor little stub. Yes, Wiktionary has problems, but only different problems, not more of them. --EncycloPetey 22:03, 19 May 2008 (UTC)
Well, yes, but they have 60+ times the user visits. I think most people used a print dictionary more than they used a print encyclopedia. Why don't we have as much usage as they have? We're not growing much faster than they are either, apparently. IOW, they are getting something for the all of the abovesaid misery. It is not obvious exactly what bold new direction would really help, but it seems that we ought to be willing to rethink some of the long-standing policies. Is there a wiktionary Google Tool? Wiktionary look-up add-ons for Writer and Word? More user stats? Do we need fact-based user personas to help us focus on actual users instead using our own preferences (now the sole basis for entry design)? I think we can have even more fun by getting more folks contributing and having more visits and more disagreements. DCDuring TALK 23:56, 19 May 2008 (UTC)
Your argument assumes that Wiktionary and Wikipedia have comparable content. We do not. There will be 60 times as many people looking up Britney spears as are looking up the meaning of "incontrovertible". The fact that we are growing at a comparable rate, despite having fewer contributors and despite being less "sexy" speaks to a real strength here, not a weakness. --EncycloPetey 02:13, 20 May 2008 (UTC)
I think we should be well pleased that we have fought our way to being the 4th most visited dictionary site on the web, at least according to these figures. DCDuring TALK 04:52, 20 May 2008 (UTC)

Wiktionary:Entry layout explained

Move to Wiktionary:Manual of style, which is in keeping with basically every other wikimedia project. Nwspel 20:20, 19 May 2008 (UTC)

No. Only Wikipedia has a Manual of Style, it is a page consisting of general guidelines about the finer points of writing in a way appropriate for an encyclopedia. WT:ELE is not a style guideline, it is a structural policy. I think there is plenty of room for us to have a Manual of Style too, it's just that no-one has got around to writing it yet. Certainly some instruction into the best way to write definitions, what makes suitable example sentences and how to make Etymologies clear is beyond the scope of WT:ELE, but should be included in the manual of style. Conrad.Irwin 20:33, 19 May 2008 (UTC)
The ELE deals with issues of layouts of a page; something which the MOS does. I see no difference enough to keep them separate. Nwspel 20:35, 19 May 2008 (UTC)
Ww need to seperate policy from general advice. The Wikipedia manual of style is general advise, and there is (as above) plenty of scope for one of these on Wiktionary. We should not make WT:ELE any larger by including the irrelevant points. A second reason to avoid having it at the same title is that Wikipedians already assume that Wiktionary is just a Wikipedia with a different name, trying to hammer the differences into them is hard enough without confusing them by calling different things the same. Conrad.Irwin 20:39, 19 May 2008 (UTC)
I think we are more like wikispecies (See WikiSpecies:Help:General Wikispecies) than we are like wikipedia. The structure and interrelationship of our entries is relatively more important than at WP. In our case, we are trying to have consistency of entries so that a user can quickly find what they want. A dictionary is used more often and briefly than an encyclopedia. DCDuring TALK 20:36, 19 May 2008 (UTC)
The fact that Wikipedians already get so confused by the system here is surely reason enough to try to clarify things. At no circumstance should there be two separate pages. I think the issue is whether to call the page the ELE or the MOS. Nwspel 20:42, 19 May 2008 (UTC)
Every project is an island. We do things here in the way we find best for us to accomplish our mission, and Wikipedia does the same. Forcing one project to do something just because another project does it will just hinder the many projects which are better served doing something a different way. - TheDaveRoss 20:44, 19 May 2008 (UTC)
Agree with TDR. We should be allowed to do things our own way, completely regardless of how the 'pedia happens to do them. -Atelaes λάλει ἐμοί 20:46, 19 May 2008 (UTC)
There are some reasons to tilt toward WP's structure, especially for our most basic pages:
  1. they have 60 times the visits that we do and may be doing something right.
  2. they are a prime source of recruits since we don't seem to do all that well in outreach to the general population.
This principle should apply to all the pages that wikipedians expect. Perhaps all of the WP shortcuts should redirect to the closest corresponding WT page. DCDuring TALK 20:59, 19 May 2008 (UTC)
Disagree with DCDuring (seems like I've been doing that a lot lately, nothing personal :-)). The 'pedia has different aims and different needs than we do. Should we change WT:ALA to redirect to something about Alabama as the 'pedia does? No. Why? Because Alabama is not a major part of what we do here, but Latin is. Our shortcuts should reflect our priorities and needs, not the 'pedia's. -Atelaes λάλει ἐμοί 21:07, 19 May 2008 (UTC)
That's a slightly different example to moving ELE to MOS, since the MOS seems to be pretty uniform pan-wiki. Nwspel 21:11, 19 May 2008 (UTC)
That's not what I'm seeing. What I'm seeing is that every project has their own name for their major formatting page, with WX:MOS linking to it. I have no problem with WT:MOS linking to ELE, but we should not change the name of the page. -Atelaes λάλει ἐμοί 21:19, 19 May 2008 (UTC)
WT:WS should not be redirecting to Wiktionary:Shortcut, since there is a more logical redirect for that (WT:SC), and since WT:WS is much more needed for Wiktionary:Wikisaurus. Nwspel 21:56, 19 May 2008 (UTC)
Based on what reasoning? As Conrad has already wisely said, the shortcuts are for editors, not readers. Do you have some evidence that this assertion is the case? -Atelaes λάλει ἐμοί 22:17, 19 May 2008 (UTC)
Based on logic, not tradition; something I hope this wiktionary would adhere to. The fact that some users may be used to WT:WS directing to Wiktionary:Shortcut is irrelevant. There is now a better shortcut for that, and that shorcut can be used on the more needed case of the Wikisaurus. Nwspel 22:38, 19 May 2008 (UTC)
But the logic is not that simple. Tradition is useful in redirects because it's what our editors are used to and will think to use (and making things easier for the editors is what these things are here for). If there's a huge demand to be linking to Wikisaurus pages, then fine. However, it's my impression that wikisaurus is a dead project. Those pages have all turned into piles of nonsense, and I think people simply prefer to put synonyms into the main entries, so that they receive the same scrutiny that everything else does. Now, I wouldn't expect you to know that, as you're a new user, but that's the point we've been trying to convey, namely that new users should not come in and, with little experience actually working on wikt, start revamping our background shortcuts. -Atelaes λάλει ἐμοί 22:49, 19 May 2008 (UTC)
If its dead, should you not delete it? You also made an interesting point; you say that new users do not understand about the project properly - but that shouldn't be the case, since the wiki should be aiming to make itself usable by anyone, not just someone who has spent months in the system. Nwspel 22:56, 19 May 2008 (UTC)
Delete wikisaurus, hmmm......that's not a bad idea. And I didn't say that we shouldn't make the project easier for newbies to get into. I said that newbies shouldn't presume to make major policy and structural changes. -Atelaes λάλει ἐμοί 23:02, 19 May 2008 (UTC)
I was only trying to help :/ Nwspel 23:10, 19 May 2008 (UTC)
Newbies get to vote just like anyone else and participate in the discussion. Our braver newbies also stir the pot. Veterans may discount some newbie opinions. I would argue that, despite the risks and the rehashing of old matters, we need more committed newbies stirring the pot. Despite all that's been accomplished, Wiktionary does not yet have laurels worth resting on. DCDuring TALK 00:04, 20 May 2008 (UTC)
Even if you keep "ELE" instead of "MOS", I still have a bone to pick with the name. We don't call Wiktionary:Administrator, "Wiktionary:Administration explained", or Help:Tips and tricks, "Help:Tips and tricks explained", so I see no consistent reason in keeping the "explained" bit on the end of the title. Many of you will sit there reading this saying to yourselves "What's he on about, it sounds completely fine" - but that's because you're used to it; if you looked at it from a non-wiktionary-native view, it would seem very illogical and inconsistent. Wiktionary:Entry layout would be much more appropriate.Nwspel 09:25, 21 May 2008 (UTC)
If we change the name as you suggest, then the shortcut would be WT:EL, but el is the ISO code for Modern Greek. The shortcut would become confusing to regular users (who are the most likely to use it). In any case, it is the page where entry layout is explained. Entry layout is one item here that really is so complex that it cannot be simply presented, but must be explained. The current title is logical. --EncycloPetey 13:41, 21 May 2008 (UTC)
"Tips and tricks explained" isn't an analogue of "Entry layout explained". One page is a page of tips and tricks (tips and tricks are its contents), while the other explains entry layout. I take your point, though, that most Wiktionary: pages don't have "explained" or the like at the end of their respective names. So what? They don't have to match each other. And it's not like someone is more likely to look at Wiktionary:Entry layout than at Wiktionary:Entry layout explained: neither is intuitive. (Nor is anything else. That's why we have links to policy pages: so people don't need to guess their names.) So there's really no need to change the name of the page; and it will serve to confuse those who know where it is already. That said, I see no reason not to make Wiktionary:Entry layout a redirect to ELE. As a side point, part of the reason I left WP and now edit (almost exclusively) here is that there was too much focus on administrativia (stub sorting, anyone? continual arguments over VFD procedures? etc.) and correspondingly not enough on making an encyclopedia. (Contrast [7] with [8].)—msh210 16:39, 21 May 2008 (UTC)
What do you mean about the comparisons with my edits? And ID is the ISO code for Indonesian, but it doesn't stop us having WT:ID, so that is not a valid point. If we don't have "Wiktionary:Administrators explained", then why should we add the explained onto this? Nwspel 18:14, 21 May 2008 (UTC)
Your logic is still flawed for the reasons I stated above. The Administrators page does not "explain" administrators; it merely lists them and allows for voting. By contrast, ELE does explain (in detail) aspects of Entry layout. Do you really believe this one small point is critical to the development of Wiktionary? I won't be bothering to discuss it anymore, because I do not think it deserves this much attention. --EncycloPetey 13:49, 22 May 2008 (UTC)

WikiSaurus

Proposal

Deletion of WikiSaurus, or the merge of it into mainspace articles. Nwspel 15:35, 20 May 2008 (UTC)

Poll

merge/delete keep
  • keep - msh210 [Note that I did not add my name to this list; it was added for me, based, I suppose, on what someone thinks my opinion is. I object to such secondhand vote-casting, so will not dignify this poll by voicing my opinion in it.—msh210 18:01, 21 May 2008 (UTC)]
  • keep (?) - EncycloPetey
  • keep - DCDuring TALK 11:34, 21 May 2008 (UTC)
  • keep - Amina (sack36) 07:38, 27 May 2008 (UTC)

Discussion

WikiSaurus is a useful place for people to play about adding stupid words - it's not part of the REAL dictionary (IMHO). SemperBlotto 16:27, 20 May 2008 (UTC)
Despite all the bad stuff in it, it does have some good stuff. Shame to be rid of it. (Incidentally, I agree with everyone else who said not to rename ELE. No objection to having WT:MOS redirect to it, though I see no need at all.)—msh210 16:18, 20 May 2008 (UTC)
I thought the whole point of wiktionary was to contain everything in one article, since the whole policy think about "wiktionary/wikipedia" not being confined by the normal "paper" issues, etc, meaning that the synonyms should simply be included in the main wiktionary articles themselves. The general impression I get from several people here is that WikiSaurus is a dead project, and the little that is on there, is rubbish. Nwspel 17:06, 20 May 2008 (UTC)

We voted on the issue of Wikisaurus, IIRC. Changing that would require another vote. And no, we don't put everything on one page. For example, we have a separate namespace for citations. Wikisaurus allows us a space to list synonym sets that would otherwise be repeated on countless pages with slightly different meanings, and require the format and lists of those synonyms to be kept in synch. Wikisaurus was started to solve that problem by allowing a freer format for presenting and aligning synonyms, near synonyms, antonyms, and such. The project has not gotten far because the one person who helped set up the pages has stopped work, and few people have since shown interest in writing new pages (except for slang terms of body parts and sex). We don't delete content just because it isn't being actively edited (or we'd delete huge numbers of entries). A better solution is to locate interested individuals to add more pages to the project. --EncycloPetey 18:07, 20 May 2008 (UTC)

However, considering no good work has been put into the project in a great while, and a lot of nonsense has accumulated, it might be worth reevaluating. When Nwspel (perhaps only jokingly) proposed that it be deleted, there were a number of "ay"s on IRC. I must admit, right now the whole thing feels like an embarrassment to the project. I would put up no opposition to simply deleting the whole thing. I am curious as to what others think of this. -Atelaes λάλει ἐμοί 18:19, 20 May 2008 (UTC)
Or merge the content of the pages there into subsections of the mainspace pages. Nwspel 18:14, 20 May 2008 (UTC)
I wasn't joking about deleting it Atelaes; in fact, you were one of the several people that helped me come to the conclusion that something should be done about it, from when you were discussing about the shortcut system with me. Nwspel 18:23, 20 May 2008 (UTC)
Merge useful content (if any) back into mainspace, leave redirects for a while. Then remove namespace. Isn't this one of those basic no-nos of database management (having separate fields that have to sync'd manually)? Anyway, it's not working. Better to keep this content in plain view, where everyone can keep an eye on it. -- Visviva 03:09, 21 May 2008 (UTC)
I have similar misgivings about Citations, but note that it wasn't supposed to be used as the default holder for all citations, only as a sort of incubator/scratchpad, particularly for words and senses that didn't yet meet CFI. -- Visviva 03:09, 21 May 2008 (UTC)
Um, if we're considering this as a matter of policy, it needs a formal WT:VOTE. If we're just considering general opinions about the current state of Wikisaurus, and how specific existing pages might best be handled, formal tabulation of positions isn't really the best course. Either way, I don't think the table above is very helpful. -- Visviva 09:07, 21 May 2008 (UTC)
Of course this is no substitute for a formal vote, but as long as that's realized, I don't see how it's harmful (barring the possibility that people's positions might be misrepresented). -Atelaes λάλει ἐμοί 09:10, 21 May 2008 (UTC)
If the results from the table above show that there is some movement for action, then it shall go to a formal vote; the voting above is simply so we can get an idea of whether or not there is enough support to put it forward.Nwspel 09:19, 21 May 2008 (UTC)
Before a formal vote can happen we need to know what the options entail. How do we handle pages like Wikisaurus:penis/more, for example. Also is it possible that the Wikisaurus space could be put to a better use, for example by providing transclusion for sets of synonyms on mainspace entries (like the templates work, so {{Wikisaurus:anger}} on an entry could provide some of synonyms for anger, rage etc.)?
You make interesting points, but my oppinion is that most of those words listed there are rubbish; and would not even warrant a chance on an RFV. We should gradually be working through the words, checking to see if they can be verified; if they cannot, delete them; if they can, make the page, and link it in the see also section or whatever, to the other word. Nwspel 11:21, 21 May 2008 (UTC)
1. By deleting them, like the irredeemable cesspools they are. (Or filtering them gradually for any valid content that may somehow have found its way in.) 2. This is a great idea and I'd like to see a pilot, but it runs into the same basic problem as the current Wikisaurus: the basic unit of synonymy is not the word, but the sense, and our arrangement and glossing of senses is in constant and necessary flux. That probably isn't a big problem for "anger" but can be an enormous problem for many words. Also, as somebody mentioned somewhere recently, synonymy is not reliably transitive; a one-size-fits-all template may not suit all members of the set equally. -- Visviva 12:01, 21 May 2008 (UTC)
Wikisaurus, like some of our Appendices, is a useful location for certain potential entries that might not warrant inclusion in principal namespace. As such, it might again serve to involve in Wiktionary some who might become principal-namespace contributors. I don't know whether there is any software or structure that could be developed that would allow Wikisaurus to benefit from the synonym work in principal namespace and also allow it to have lower quality material as well without risking corrupting principal namespace. DCDuring TALK 11:33, 21 May 2008 (UTC)
I don't think Wikisaurus lists should be including anything that doesn't meet CFI; our tacit allowance for this is a big part of why the 'saurus has become such a steaming pile. -- Visviva 12:01, 21 May 2008 (UTC)
Or to put it another way, to allow unverifiable content on WS as a matter of course would do an active disservice to our users, by providing them with information that is very likely to be wrong. IMO this is actually worse than providing no information at all. -- Visviva 12:28, 21 May 2008 (UTC)
There are several hundreds of words in the WS that have not been verified, are probably mostly wrong, etc, etc, etc, so I find myself agreeing with Visviva. Wiktionary itself has sections for synonyms in its mainspace articles; there is no reason to repeat the same thing twice; once on mainspace, once on WS; except, in the latter, several hundreds of other "words" seem to have been added that are not verifiable in any way, and as we merge the "project" into the mainspace, will have to be verified. Nwspel 17:55, 21 May 2008 (UTC)
About synonyms - for those who learn English as a second language, listing synonyms is just one half of the equation. Another very important aspect would be describing the usage of each and explaining what situation they are used compared to each other (formal, informal, written, spoken, positive, negative, etc.) When looking at a group of synonyms, one (maybe the most common) could be used to hold these descriptions, while the other could just point to it. --Panda10 18:29, 21 May 2008 (UTC)
But WS does not do that anyway. Nwspel 19:18, 21 May 2008 (UTC)
(Not that you're doing it, but...) I hate it when I hear anyone say that something is done the "correct" way or not about Wikisaurus. For instance, I've been yelled at for using titles with parentheses, whose purpose was to "disambiguate" meanings. Thankfully that's been resolved by saying that the page can be devoted to the primary meaning of a word, but I still feel that the way I had done it would have been pretty cool. Wikisausus is not established enough for creativity to be stifled, and Panda10 has a great idea. In fact I like the American Heritage Dictionary for precisely that reason, the way that it explains synonyms. DAVilla 01:34, 26 May 2008 (UTC)
Hear, hear. The /more pages can be deleted or at least ignored, and I'm all for shooting any red links on sight. On the other hand, there are a number of terms in thesauruses that are not necessarily idiomatic, so this isn't the perfect solution. Requiring blue links is probably a good enough starting point, though. It's been brought up before, of course, and I think the main objector had been RichardB. In my opinion he was right about most of the terms he wanted to include, but RFV has progressed substantially since the days that some of the earlier bickering had taken place.
I would rather the Wikisaurus space sit unused than have it removed. It needs to happen, and one of these days someone is going to come along and make something out of it. The best proposal I've heard is to import a public domain thesaurus, much like Webster's was used to jumpstart the dictionary. It would also be great to have bots running around and adding things to the Wikisaurus that are synonyms in the main dictionary space. Of course bots usually require some structure first though, so this would probably need some human guidance if it were to be done now. DAVilla 01:34, 26 May 2008 (UTC)
There's a lot of importable synonym content in Webster's, such as at [9]; that might be a good place to start. -- Visviva 04:22, 26 May 2008 (UTC)
  • I've taken a stab at Conrad's (? unsigned) suggestion above; see Wikisaurus:anger and anger#Synonyms. I created {{comma list}} for the occasion; I hope it will also find other applications. This approach looks promising, and has the advantage that it would actually reduce fragmentation of content... BTW, I'm not much with layout or coding, so if anyone wants to fix these up, please feel free. :-) -- Visviva 04:22, 26 May 2008 (UTC)
See also the very old idea at User:Conrad.Irwin/anger, I couldn't get it to work nicely enough for most cases, though it works well in a few - your idea looks a lot neater though. Would it be possible to change the heading from ====Synonyms==== to ====Thesaurus====, and then transclude the synonyms, antonyms headings so that the normal edit links work by editing the Wikisaurus page? Conrad.Irwin 12:22, 27 May 2008 (UTC)
This is a nifty notion, but I'm afraid it would be problematic for most words, since monosemy is unusual for most core (or even semi-core) vocabulary, and even if we have a "Thesaurus" heading we can't very well have multiple such headings under a single POS. An advantage of the comma list is that it can be spliced into existing entries without breaking up the existing ontology. -- Visviva 15:45, 27 May 2008 (UTC)
Thesaurus Flunky below holds my comment. Amina (sack36) 08:09, 27 May 2008 (UTC)

closed captioning

At Talk:d'oh, a user mentioned that "annoyed grunt" is how that entry term is indicated in the closed caption. This would appear to be a excellent possible source for some other non-verbal entries. How should such information be shown in an entry? Does anyone know of sources of information about how the meanings of meaningful non-verbal sounds are communicated to the hearing-impaired? DCDuring TALK 11:02, 24 May 2008 (UTC)

Subtitles in DVDs are also an excellent "permanent" publication record for this sort of thing. However, one has to be aware that both closed captioning and subtitles are prone to errors. I once watched an episode of "Seinfeld" while out with friends. (Note: this was one of only two episodes I ever watched at all because I can't stand the program). The television on the wall was muted for sound and the closed captioing was turned on. As the program went on, the captioning became more and more garbled, until it was a string of nonsense characters and symbols. As far as I can tell, some closed captioning is "recorded ahead of broadcast, but other times is typed "live" as someone hears the program and types what they think they heard, with all the potential for errors that implies. I have seen problems in subtitles as well, such as typing the original script rather than the recorded performance, or the editing out of content. There is a principle among those who write subtitles for translated films that the subtitles should not exceeed a certain length, in order to permit them to be read by the audience. This principle is sometimes applied to transcribed subtitles as well. --EncycloPetey 13:51, 24 May 2008 (UTC)
Live closed captioning is usually done with automatic speech recognition software, which feeds directly into the captioning encoder i.e. with no way for a human to edit. Some TV productions take the time to take the script, revise it to reflect the actual recorded program, and then fed the edited version into the caption encoder. But quite a few just shove the tape into the automatic software, and use the unedited results. Clearly they don't care; they are just meeting the requirement via the broadcasters that material be closed captioned. This no doubt explains the Seinfeld episode; there were some problem with the audio, but they only checked the beginning, or didn't bother checking it at all. Robert Ullmann 14:03, 24 May 2008 (UTC)
I hadn't heard of automatic closed captioning, but the manual work varies widely in quality, apparently a great deal of it being at the low end of the scale (there are also subtitles and audio description which may be useful references). More info at Joe Clark and Captioning SucksMichael Z. 2008-05-24 14:51 z
Glad I asked. Thanks for the assessments, cautions, and links. "d'oh" may be a bit of a special case because it is the signature catch-phrase of the show. DCDuring TALK 17:52, 24 May 2008 (UTC)
Gievn the volume of Simpsons-related merchandise, I'd be surprized if several printed sources weren't readily available in support. --EncycloPetey 19:17, 24 May 2008 (UTC)
The written "D'oh" appeared in the show itself, in the episode about how Lisa got her saxophone; Homer accidentally says "d'oh" while reciting the inscription for the sax, and it ends up in there. If anyone can track that down, it's citeable in-canon. —RuakhTALK 02:08, 25 May 2008 (UTC)
I was interested in transcripts generally as a possible source of the meaning of non-verbal expressions, such as "d'oh". If the transcript or script says that is supposed to be "annoyed grunt", that is meaningful, but not a definitive statement of how viewers interpret and use the expression. Non-verbal expressions are different from words, in that the words carry most of the full load of meaning, but these non-verbal expressions do not. Some, but not actors, would say that words "speak for themselves". I don't think that is as true of such expressions. That is why I was hoping that "annoyed grunt" and other clarifying statements of the script-writer's intent would be available for a range of non-verbal expressions. Unfortunately, that does not seem likely to be the case. DCDuring TALK 04:02, 25 May 2008 (UTC)
It's normally put as (ANNOYED SOUND) or (ANNOYED GRUNT) in relay calls, if that helps anyone. --Neskaya talk 00:18, 26 May 2008 (UTC)
Is that done to avoid special-character issues with d'oh or for other reasons? DCDuring TALK 01:24, 26 May 2008 (UTC)
partially to avoid special-character issues, I would assume. However, d'oh could be portrayed as d oh which is usually how things like "I'm" are portrayed (i m). I never actually thought about the particular reasoning issues there. Do closed-captioning services have apostrophes, though? I have never seen apostrophes used in the captions for films at school or anything. --Neskaya talk 00:11, 27 May 2008 (UTC)

Thesaurus flunky

I'm looking for a word or words to work on and an area where that should be done. I finally got my User page up so you can check that if you need to.

I didn't know if the wiktionary method of proceeding was the same as the wikisaurus method was the same as the wikipedia method. I've begun the process of reading my welcome but dry stuff takes me a while. I will get through it, though. Amina (sack36) 07:36, 27 May 2008 (UTC)

I wrote this before I had read the relevant section here. I have now done that. The one thing that Wikisaurus would have that a synonym spot in wiktionary doesn't have is the ability to go beyond the mind set that other people have. It drives me crazy not having a Roget setup in a thesaurus. What if all you can think of is slingshot and what you really want is the medieval siege weapon, mangonel? With a Roget's you can get to it eventually. With the dictionary type of thesaurus you have slim chance to none.
So. I'm willing to volunteer the time to clean up as much as possible and work on valid words. I don't make this offer lightly. I know there will be a load of hard work and long hours. I still want to give it a shot. If I haven't made a difference in a month, I'll cry crocodile tears while you delete or merge.
Go for it! We've all kind of given up with it, as it's been a mess for too long. Be as bold as you like, and I look forward to seeing the results. Conrad.Irwin 12:16, 27 May 2008 (UTC)
Agreed. While I have grown so used to the low quality current state of Wikisaurus that I remain skeptical of it becoming a useful project, I have to admit that there is always a chance that someone can turn it around into something beautiful. The best of luck to you. -Atelaes λάλει ἐμοί 18:09, 27 May 2008 (UTC)
Cool. Go for it. There could be real value in having a thesaurus-style presentation of the data already in Wiktionary, judging from the number of requests for synonyms and from Feedback. A thesaurus could also help us identify missing senses of words that are in Wiktionary. DCDuring TALK 18:14, 27 May 2008 (UTC)
Under ws|improvements Preview, I have put together a plan of action for cleaning up and making manageable the wikisaurus project. Could you read through and give me your impressions? Amina (sack36) 02:26, 29 May 2008 (UTC)
I assume you mean: Wiktionary:Wikisaurus/improvements? --EncycloPetey 03:53, 29 May 2008 (UTC)
Um... yeah. Sorry about that. Amina (sack36) 04:19, 29 May 2008 (UTC)
Could I propose a change to the header of Wikisaurus? It's purpose is to concatenate the introduction and allow the beginning available words to show on the page without scrolling. You can see the proposal at Template_talk:saurus-head Amina (sack36) 00:27, 30 May 2008 (UTC)

What brought me back

I just stumbled onto a TED talk that calls for more ameteur lexicography (word-hunting, with context). It inspired me to spend more time at this site. I thought others might be interested in the video. --Polyparadigm 01:33, 29 May 2008 (UTC)

It is a cool presentation. It was fun to listen to it again. It would be interesting to figure out how we can be less traffic-cop-like and not be over-run with vandalism. Erin's message is very positive and inspiring. DCDuring TALK 03:49, 29 May 2008 (UTC)

Scholarly hypersensitivity or sophistication

"Falso cognate" is certainly NOT an accurate description of these terms. False cognate is not the same as "not cognate". A false cognate implies that there is or was a significant number of people who believed the words were cognates. Examples of false cognates are English dog and Mbabaram dog; English mama and Quechua mama. I don’t believe anybody thinks is related to English mow, or that -다 is related to English do. They are not false cognates, they simply are not cognate. To label them false cognates means that there is a group of people who think, or once thought, that they were cognate. —Stephen 18:45, 29 May 2008 (UTC)

-- Quoted from User talk:KYPark#Korean "false cognates"

``I don't believe anybody thinks is related to English mow, or that -다 is related to English do.``

I don't believe anybody thinks the mistake involved in the above quote very seriously, but hypersensitive scholars may do. In what ways?

Korean verbs and adjectives end with -다 (-da), which is thus similar, analogous, equivalent, or related (just functionally, hence NOT in Stephen's genetic sense) to the French ending /-er/, the Germanic /-en/, AND the English eccentric preposition /to/ but NOT "do" as Stephen mentioned. So granted, the French and Germanic postpositional endings are more equivalent to the Korean counterpart than the English prepositional /to/. Is such comparison entirely meaningless from the point of view of w:general linguistics, w:comparative linguistics, w:universal grammar, or just from the popular point of view of curiosity? I don't believe anybody thinks it is so.

Dutch doen, English do, and German tun are equivalent to Korean 하다 (ha-da). Meanwhile, they are cognate with Greek θέτω (théto, to put, to set, to place), Lithuanian deti (to put), Czech diti (to hide), Polish dziać (to happen), Russian деть (to put, to place), etc. In this archaic sense, they are equivalent to Korean 두다 (du-da). Hence a striking equivalence, both semantic and phonetic, with English. Whoever reads this, including Stephen and Ivan, would never ever forget Korean 두다 forever hereafter. What a mnemonics!

On the basis of the above discussion, sophisticated or hypersensitive scholars could, would, or should set up a w:straw man or more to stand for, or in the image of, me, namely, cognate, false cognate, not cognate, etymology, IE-Korean nexus, IE-Korean genetic, Ural-Altaic, or whatever sophistication.

Wikimedia in general aims to serve for the average readers or public in general rather than sophisticated scholars. Then, the dichotomy of "false cognate" and "not cognate" would be too sophisticated for them, but that of "true cognate" or "false cognate" would do. Yeah, it is surely possible to define "false cognate" as such as Stephen, in contrast with "true cognate" AND "not cognate," especially when needed to set up a straw man.

In corollary, all scholarly arguments are more or less of such dirty arguments. Frankly, I am supposed to practice some in dirty discussions, while almost unconsciously attempting to reduce them to a minimal, which I call unconscious morality in action. Consciously and unconsciously, we ask or just wish others not to wield such dirty tactics at will, at random. Throughout the BP talk I hve had now and then, I've been getting more and more skeptical how they behave themselves in the context of Wikipedia. To me, however, it is NOT a great surprise, as I understand such is the usual scholarly behavior. At this point, I fear, Ivan would like to set up a straw man. No, he has already done and attacked it near the end of User talk:KYPark#Korean "false_cognates". --KYPark 03:26, 30 May 2008 (UTC)

See also; #Can someone else please be the bad guy
To begin with, I must simply say that I am not, for a single second, buying the mnemonics argument. KYPark has a very long history of pushing a genetic relationship between Korean and IE languages. That he has switched to calling them mnemonics after the community made it abundantly clear that they were not interested in such etymologies is beyond suspicious. What bothers me most about all of this is that so much time of valuable editors such as Stephen, Visviva, and Ivan (who could be writing real etymologies) has been wasted trying to reason with, clean up after, and debate minor semantics with KYPark. So let me address KYPark very simply. The community has made it abundantly clear that we have no wish to see any comparisons between Korean and IE languages, not as etymologies, not as mnemonics, not at all. I am tired of debating this issue. If you write anymore about comparisons between Korean and IE words you will be blocked for successively longer periods of time. I very ardently plead with you to not do so, because I genuinely have no wish to lose you, as, aside from your Korean/IE hypothesis, you are an excellent editor, and you make very good Korean entries. However, I am unwilling to devote further time to this debate. -Atelaes λάλει ἐμοί 04:37, 30 May 2008 (UTC)
Seconded. -- Visviva 04:40, 30 May 2008 (UTC)
To begin with, I say that Atelaes set up still another w:straw man for me, called mnemanics, so as to cut it down, as I've expected as his or their usual tactics. No wonder at all.
The best of such tactics makes best use of w:contextomy or w:quoting out of context, say, simply "mnemanics" out of "a very long history of pushing a genetic relationship between Korean and IE languages."
I once said to Visviva, who seconded Atelaes a while ago, that the variety of uses of my, hot if you like, Korean-English comparison is rather undefinable, including associationist memory, for which Visviva specifically reminded me of the shorter "mnemonics" that he often attempts himself.
He happened to take 매다 "to weed," for example, with which he associates "hawk" brightly. In contrast or response, I associated (or compared) English mow, Dutch maaien, etc., instead, at least for far better mnemonics. There was no sense or hint of "genetic relation" here, though I myself was quite surprised by the semantic as well as phonetic similarity. It was such an accident indeed.
It was Atelaes that invited me to the BP again and again. But I refused, because I know it is not the place for me to argue as if I were a scholar I hate. Read and edit is all I like doing here.
At last he initiated #Can someone else please be the bad guy. It is very clear from the title that he attempted a wicked personal attack on me! In advance, such a symptom was surely realized from a few trivial encounters.
For the first time, I edited the Citations page of witch, while feeling like being made a witch, so most likely reflecting my miserable feeling on the note. If I am not on the right track, the only thing the admin like Atelaes should do is just to advise me that my way is not the right way! I presume.
ANYWAY DEFINITELY, I was not attempting any Korean-European genetic claim. Had I been so attempting, I should have mentioned Korean (bich, bit) as a cognate with witch, as I do guess! Then, Atelaes would have had every reason for attacking me personally according to their agenda.
I feel like Atelaes being trapped by me. Or, in effect, he trapped himself, or he was trapped anyway, and he tries to escape from it desparately. I see it, as anyone sees it. If NOT, he should not make such a biiig ultimatum, say, block me. Go ahead and block me if you like. I am not afraid of it at all! But never forget that that might be the end of Wiktionary! Don't guess as if I were joking! You can do anything, however harmful it may be to Wiktionary.
I could, would and should not say everything here. I'm just sorry. Remember this is not the end. --KYPark 10:32, 30 May 2008 (UTC)
Everybody take a deep breath :) KYPark, Wiktionary is based on consensus. If you believe that your approach is correct, you may submit a proposal at Wiktionary:Votes. If you do not receive enough supporting votes, then you should abide by the wishes of the Wiktionary community. Before you consider such an option, be aware that you will most likely be outvoted (based on reading the above). Please carefully write your proposal in clear and concise English. -- A-cai 12:18, 30 May 2008 (UTC)
I always thank you for everything indeed. But things may not be so simple as you may think. I need to know the way they think, not yours. --KYPark 12:49, 30 May 2008 (UTC)

Suggestion from OTRS mail

>  I would just like to make a tiny suggestion that would help make your
> website better. I would like to point out that your quick reference
> dictionary does not have any syllable breaks that can help in the
> pronunciation of some words. If you could put syllable breaks on newer
> articles, later posted on your website, it would really help. Thank you!!
I said I'd relay the suggestion to the Wiktionary team, so... here ya go. :) ~Kylu (u|t) 03:56, 30 May 2008 (UTC)
I second the suggestion positively. For Wiktionary should remain evolutionary. --KYPark 04:17, 30 May 2008 (UTC)
The suggestion is not specific enough to act upon. If the suggestion is about pronunciations, then in which languages? For some languages, there are no syllable breaks in the pronunciation, and for many languages they are very difficult to place with certainty. If the suggestion is about hyphenation, then again which languages? For English, hyphenation syllable breaks differ between the US and UK for some words, and I do not know how to reliably present these differences, because I don't know the UK algorithm for determining where to hyphenate. --EncycloPetey 04:22, 30 May 2008 (UTC)
They use math for that? Now I'm scared. --Neskaya talk 07:03, 30 May 2008 (UTC) (Argh I didn't sign in.)
(Actually, we use maths.) Widsith 10:36, 30 May 2008 (UTC)
Now I'm more scared. As if one wasn't enough of that. --Neskaya talk 06:19, 31 May 2008 (UTC)

Dude, I gotta jump in on this one. It's a good suggestion that I have mentioned in individual conversations before. I'm taking the user's request to mean a desire for hyphenation. This is actually the utility I require most from a dictionary in my profession as a sheet music arranger, and for all my affinity for wiktionary, I have to take my business elsewhere for that information. True, it's only useful for very specific applications (one of which happens to be my daily work), and self-hyphenating word processors have removed the necessity of this information from all would-be amateur publishers, but I think it's a worthy addition to our format. Especially since there are US/UK differences, it's then all the more important that this info is ours alone to share. Other sources certainly don't make that valuable effort. I once thought it would be cool to break apart the headword with little dots showing hyphenation, but maybe this goes better in the Pronunciation header. On a slightly larger topic, I would love a standardized pronunciation section, maybe beginning with Hyphenation: Head•word, followed by Simple Pronunciation: hed-wurd, then IPA, SAMPA, Rhymes... Whaddaboutit? -- Thisis0 19:27, 30 May 2008 (UTC)

How about a hyphenations header when needed? The hyphenation doesn't really impact the pronounciation, or the spelling, it's a separate kind of property. RJFJR 19:31, 30 May 2008 (UTC)
The pronunciations sections, if fully tricked out with content in table form and Homophone L4 header takes up 30-40% of the highly valuable "above-the-fold" space on the first page without offering intelligible or usable context to most non-expert users. Adding another space-burning header strikes me as squandering prime real estate.
OTOH, I would think that syllabification (with stress shown) would be more valuable to most users than what we now provide under the pronunciation header. It could actually appear routinely on the inflection line if we chose, thus offering more value to most users than the existing pronunciation section at no or little cost in above-the-fold real estate. DCDuring TALK 19:46, 30 May 2008 (UTC)
Syllabification could be useful, we already have {{hyphenation}} which is used in the pronunciation section. I think we should have a {{syllables}} or {{syllabification}} template which does pretty much the same thing, but obviously is for syllables not hyphenation - although similar it must always be stressed that they are not the same. There's the slight issue with the UK-US variations, but this is not a big problem - we seem to have managed alright with the rest of the pronunciation section that must be split thusly. There is a Hyphenation algorithm which is fairly good, but it is always possible to find exceptions so any automated task to add them would have to be carefully supervised. I'm not sure whether anyone has attempted to write a syllabification algorithm, but I suspect it'd suffer from the same problem. On a similar, yet very different theme - {{collate as}} now exists to help with the shiny new Project - Text Processing Information, the discussion that took place on the Grease Pit could probably do with some input from the Beer Parlour. Conrad.Irwin 21:40, 30 May 2008 (UTC)
Conrad, the pronunciaiton is already set up to include syllable breaks; we do not need another new template for that. And in any case it CANNOT be used with the English spelling of the word, because the pronounced syllable breaks sometimes come in the middle of letters. Consider exactly, which hyphenates as Hyphenation: ex‧act‧ly, but for which the syllable breaks occur as /ɛk.sækt.li/. That's one very important reason why IPA or some other transcription is used for pronunciations; you can't put the syllable breaks in using the usual English orthography. And they are only sometimes similar. Even for words that hyphenated the same in the US and UK, the spoken syllable breaks may be in different locations and cannot be predicted from the spelling alone. --EncycloPetey 21:55, 30 May 2008 (UTC)
Ok, as a complete ignoramus in the pronunciation section I was jumping to conclusions from what was said above. Where is the syllabification information stored? It doesn't seem to be in the pronunciation section at pronunciation or hyphenation - though maybe that's just because I can't read IPA well enough. Interestingly enough Wiktionary:Pronunciation says we do include it, but I can't decipher where or how. I'd disagree with cannot be done algorithmically, but am happy to admit that the Google search isn't that promising. Conrad.Irwin 22:13, 30 May 2008 (UTC)
The little period-looking things, the stress mark, and secondary stress all indicate syllable breaks. You can see them in the examples of pronunciation of exactly that I gave above. These are not consistently marked in all languages, though, because some words and some languages (like Czech, IIRC, or possibly French) do not have any discernable syllable breaks in the spoken forms. For other languages, especially in East Asia, this information is often trivial because the charatcers are themselves syllables. --EncycloPetey 22:38, 30 May 2008 (UTC)
In the pronounciation string, syllable breaks are shown as periods.
Except that it's not the same character as the period; it's a symbol in the IPA character set. --EncycloPetey 22:43, 30 May 2008 (UTC)
IPA syllable breaks are often left out, but perhaps we should encourage them to always be explicitly entered. (I believe the IPA symbol is a regular period U+002E, no?) Details at IPA#Suprasegmentals.
But explicitly entering them is also a problem. Many words do not have an explicit syllable break, and placing the syllable break in the pronunciation is therefore misleading in those cases. In English, the placement of syllable breaks also varies widely with dialect. I have seen thus far only one dictionary that recognizes this fact (the Cambridge English Pronouncing Dictionary), but even this volume doesn't capture the whole range of variation. It presents only the variation present in the Received Pronunciation and a well-enunciated General American accent. --EncycloPetey 23:59, 30 May 2008 (UTC)
Wow. But this is still soluble by showing multiple dialects, and multiple pronunciations per dialect. We already do that. The trick is knowing when to leave out a syllable break, yes? Michael Z. 2008-05-31 02:01 z
I really wonder how linguists go about syllabification. English speakers tend to break words apart the way that they think of them, but not the way that they sound. The example given is a perfect illustration. Mentally "exact" is the root and we break the word there, but in the pronunciation /ɛgzæktli/ it makes a lot more sense orally to tie the t with the l as part of the same syllable. Scream the word at the top of your lungs and you'll see how awkward it is to break them apart. It also doesn't really matter all that much how syllables divide a single word because the consonant clusters at the end of a word can attach themselves to the next word. Consider sand eel = sænd + i:l where the syllables span words since sandy = sæn.di. Discernable syllable breaks are probably far fewer in number and far less important than most of us suspect. 75.54.80.198 07:28, 4 June 2008 (UTC)

Hypehenation isn't only British/American, it is also a matter of style. It appears that some dictionaries provide hyphenation mechanically at every possible break, relying on the editor to use discretion (or not), while others recommend better hyphenation breaks.

For example, the NOAD indicates every possible location, while the CanOD is more conservative, and explains some of the reasons for its recommendations in the frontmatter—for example "plane-tary" would not be smooth reading. (Note that the place of hyphenation sometimes varies between the two, and I have no idea if the CanOD's corresponds to British usage, or is a Canadian style, or is based on the corpus.)

Hopefully any automated algorithm would at least avoid breaks like "throw in some fresh shit-ake mushrooms".

NOAD plan·e·tar·y fore·fin·ger fin·ger·nail gov·ern·ment self-gov·ern·ment
CanOD plan·et·ary fore·finger finger·nail gov·ern·ment self-govern·ment

We need a convention for indicating the source or style of hyphenation beyond just British/American. Do we cite a source? Michael Z. 2008-05-30 23:30 z

Yikes! CanOD: an·alogy, an·aly·sand, an·aly·sis. NOAD: a·nal·o·gy, a·nal·y·sand, a·nal·y·sis! NOAD also allows ex-acting, coin-cidence, read-just, leg-ends. Shouldn't a quality hyphenation guide omit such breaks which are more than awkward? Or should we report everything recommended by some reference? Michael Z. 2008-05-30 23:54 z
All that the latest portion of this thread shows is that NOAD sucks balls at hyphenation and should never be mentioned in a discussion about it. -- Thisis0 00:21, 31 May 2008 (UTC)
It seems possible to me that NOAD haven't checked their automated hyphenations, the four mistakes Mzajac listed above above (co-in-ci-dence, ex-act-ing, leg-ends, read-just) are replicated by the Knuth-Liang algorithm (though it does a better job than EPs example and the top of Anarchy of Pedantry). From reading about hyphenation it seems that we can say whatever we like about it, we are just providing a guide - there is no strict right or wrong (beyond common sense and aesthetic judgment). Obviously that poses problems in the "No, I'm right" style of wiki-dispute, however given the low importance of this I'm hoping people won't get too worked up about them. Conrad.Irwin 00:38, 31 May 2008 (UTC)
Related reading: On Hyphenation - Anarchy of PedantryMichael Z. 2008-05-30 23:56 z
Ultimately, hyphenation is about making printed text look better. Many dictionaries simply present all the possible places one could conceivably hyphenate, even to including hyphenation breaks that no sane publisher would ever use. Consider this humorous example:
American families in the 1950s empathized with Lucy and Ricky Ricard-
o. Each week, millions turned in to watch the antics of Lucy and friend E-
thel Mertz.
No sane publisher would hyphenate this way, but if one follows the kind of hyphenation advice given in some dictionaries, then this could be the result. ---EncycloPetey 00:11, 31 May 2008 (UTC)
All my sources would unanimously say Eth-yl and Ri-car-do. -- Thisis0 00:47, 31 May 2008 (UTC)
Admittedly I fabricated a quick example, but the implication stands. How would your sources hyphenate mighty? --EncycloPetey 20:43, 31 May 2008 (UTC)

I'm convinced that my daily use of hyphenation is much more esoteric than I had assumed. I require (and publish) hyphenation for ev-'ry sung syl-la-ble in vo-cal mu-sic. I suppose in light of this discussion, this is somewhat a combination of syllabification and hyphenation. However, I have found most reputable dictionaries provide standardized hyphenation of every syllable, seemingly for my industry alone. (Anyone know any other widespread applications? Aiding pronunciation?) Standardized hyphenation by syllable becomes more difficult as I publish Italian and Spanish songs (which I have hyphenated as best as I can, usually with each syllable beginning with it's consonant). English does have standardized hyphenation -- usually to avoid awkward or misleading word fragments, especially those that would lead one to pronounce a different vowel sound before getting to the rest of the word. Properly hyphenated "mu-sic" does not keep the letters of it's component "muse" together; also it avoids the desire to begin to pronounce "muh" if one saw only "mu-" at the end of a line. "Wom-an" avoids a momentary "woh" pronunciation, and seems to deemphasized the subordinate relationship to "man". "Vo-cal" avoids a preemptive "vawk" sound when reading. Anyway, I still think this is worth the effort (to start using the hyphenation template, I guess -- which I didn't know existed) especially if we can delineate some of the US/UK specifics as well as discover standardized hyphenation for other languages. -- Thisis0 00:41, 31 May 2008 (UTC)

Does the hyphenation template have any provision for indicating stressed syllables? Could it be made to do so? I realize that this is a simplistic approach to pronunciation, but, then again, so many of us ordinary users are simple people. Does it pay to show multiple kinds of hyphenation (and stress)? DCDuring TALK 01:18, 31 May 2008 (UTC)
Inspection suggests that nothing would prevent stresses from being added with a character like "'", presumably at the beginning of the stressed pseudo-syllable (pace EP). DCDuring TALK 01:26, 31 May 2008 (UTC)
Excuse me, but hyphenation has nothing to do with stress or pronunciation. Stress should never be marked in the hyphenation template. --EncycloPetey 20:39, 31 May 2008 (UTC)
I get it now. Obviously, music publishing has different needs than prose, and this is where guides like NOAD's make sense. Even for prose writers or typesetters, it may be useful to know every place where a word can be hyphenated, for extreme cases or art. But it would be benificial to also have a practical guide for non-expert writers.
So how do we annotate all of this? Michael Z. 2008-05-31 01:57 z

Hyphenation example

Let's see what an extreme example of hyphenation can look like. Can anyone add more permutations from other dictionaries, or other English styles? Or add a different word if you know of a more diverse example. Michael Z. 2008-05-31 02:14 z

  • Hyphenation (US, NOAD): a‧nal‧y‧sand
  • Hyphenation {US, MW3): anal‧y‧sand
  • Hyphenation (Canada, CanOD): an‧aly‧sand

Or a functional description:

  • Hyphenation (US, every break): a‧nal‧y‧sand
  • Hyphenation (Canada, for prose): an‧aly‧sand

With stress shown:

  • Hyphenation {US, MW3): a:nal‧y‧:sand
    ( " : " is an approximation of MW3 notation for could get primary stress in some pronunciations or unstressed in pronunciations stressing the other so marked. " ' " is an approximation for their stress marker)
Stress should never be included in hyphenation, because it is unrelated to the function of hyphenation. Stress is peculair to spoken language; hyphenation is peculiar to written language. --EncycloPetey 20:41, 31 May 2008 (UTC)
If there were a way to finesse it, I would have been willing to argue with you, EP, but the messiness of the very first case suggested that it would not be simple. I will continue to look for ways to get naive new users to get pronunciation benefit from Wiktionary. DCDuring TALK 20:48, 31 May 2008 (UTC)

piece of paper et al.

Is there a policy or practice about whether such entries should be included? This would apply to uncountable noun senses for many, many entries. In the case of paper one would need scrap of paper, sheet of paper, ream of paper, roll of paper, stack of paper, etc. If the phrases were in the entry for paper they would be findable by search. Though the entry for paper could stand to have a list of such ways of achieving countability for quantities of paper, I see much less value in the individual entries. They are certainly related terms, but might warrant a separate rel table. DCDuring TALK 20:45, 30 May 2008 (UTC)

Piece of paper might merit an entry, as it is slightly idiomatic (when someone asks for a piece of paper, they're not asking for a piece, but a sheet). The rest should not be included anywhere, unless they have similar idiomaticity (if anyone knows a real word to replace idiomaticity in this sentence, please feel free to do so). -Atelaes λάλει ἐμοί 21:00, 30 May 2008 (UTC)
They might be asking for a "scrap" of paper or even a memorandum or an index card or a post-it note. It seems to have much more to do with the situation than the words. I didn't really think that we could handle much of that kind of context dependence. DCDuring TALK 21:11, 30 May 2008 (UTC)
Hmmmm...that's a good point. -Atelaes λάλει ἐμοί 21:17, 30 May 2008 (UTC)
Is there a policy or practice in this regard? DCDuring TALK 21:07, 30 May 2008 (UTC)
Well, my understanding about the related terms bit (if that's what you're asking about) is that terms only merit entry in the related terms section if they merit their own entry. This doesn't mean that they actually have their own entry at the time, but it is conceived that they would eventually. -Atelaes λάλει ἐμοί 21:17, 30 May 2008 (UTC)
I'm with Atelaes, piece of paper is idiomatic, the rest are SoP. However as they are all reasonably common I wouldn't complain if the entries existed. It might be better to have these as some sort of extended usage note at paper, as opposed to a "related terms" section, if we aren't treating them as terms and giving them entries. Conrad.Irwin 21:28, 30 May 2008 (UTC)
Maybe I ought to take a look at a few of the uncountable words and determine whether there is anything of interest for any subset of them. Many achieve countability by generic means: "item of", "instance of", "case (situation) of", "[container] of", "[measure] of". Some may be like paper, having more idiosyncratic units. I am not sure that I see the idiomaticity of piece of paper and sheet of paper, as opposed to paper having its own particular units. DCDuring TALK 21:44, 30 May 2008 (UTC)
Isn't "piece of" one of the most generic ways of making countable an uncountable? DCDuring TALK 21:47, 30 May 2008 (UTC)
Right. And if you doubt it's idiomatic, you're welcome to prune the list. (Just be sure to comment it.) And if you disagree with that judgement, create the entry rather than putting comments in the list like <!--this is idiomatic--> which no one will ever see anyways. 75.54.80.198 06:47, 4 June 2008 (UTC)
It might be worth looking at how this is handled in our Japanese entries, since the phenomenon is more widespread in Japanese. In English, it's rare (piece/sheet of paper, head of cattle, pair of jeans) but it's the norm (as I understand it) in Japanese. --EncycloPetey 21:49, 30 May 2008 (UTC)
Well then I guess "piece of" anything would be out in English, per DCDuring's comment, and we would somewhere have a table of measure words for uncountable nouns like paper. How long would that list be, in scrolls and/or yards? And would it explain any of the terms? I rather like the current solution since it's not clear that "piece" means a sheet and "scrap" means a piece. Of course the problem is that we then have to get into these tedious debates in order to have something like bar of soap or sheet of paper deleted as nothing more than sum of parts. 75.54.80.198 06:47, 4 June 2008 (UTC)

Language specific help templates

This topic involves both general and technical issues, but I hope the tech junkies who frequent the GP will still read it. I recently created {{attention}}, which is intended to mirror the functionality of templates such as {{la-attention}}, {{zh-attention}}, but work for all languages. Basically, the template is inserted into an entry (with the language code as the first parameter) and places it in a category for people with specific knowledge of that language to look over and, if necessary, clean up. In addition to putting, say, Latin entries in the Latin cleanup category, this can also be used in entries of any language which need attention from an editor with capabilities in a different language. So, for example, if an English word comes from Ancient Greek, but the person writing the etymology has limited Ancient Greek skills, they can put {{attention|grc}} in the entry, and I'll see it and look over the entry. In addition to simply advertising the existence of this template, I am hereby proposing that {{rfscript}} be deprecated in favour of this template. The problem with {{rfscript}} is that some scripts, such as Cyrillic and Arabic are used in so many languages that it is unlikely for any editor to know enough about every language which uses that script to be able to adequately respond. Just because a person knows Hindi, does not mean that they will know Sanskrit or Marathi. Just because a person knows Hebrew, does not mean they know Aramaic or Yiddish. Granted, we are fortunate enough to have Stephen working for us, who seems to know every language ever used, and so the rfscript method has worked reasonably well up to this point. However, I managed to stump even him with a Pashto request at πάρδαλις (párdalis) (although, I do believe that's the first time I've ever managed to do that :-)). The template currently puts the entry in the category [[Category:XXX words needing attention]], but I think it might be prudent to switch it to [[Category:Words needing XXX attention]], which makes more sense when put into foreign language entries. Thoughts? -Atelaes λάλει ἐμοί 22:44, 30 May 2008 (UTC)

Actually, I like the current setup. I find that it's easy to spot an out-of-language entry in the category and understand that there's a translation or etymology in need of care (such as when cutify turned up in Category:Latin words needing attention). I only see a potential problem for categories that are not being maintained and so grow out of control (like Category:Japanese words needing attention). In those cases, further subdivisions or something might be useful. Personally, I prefer seeing the language identified up front in the category name. --EncycloPetey 22:51, 30 May 2008 (UTC)
I also like the subdivision. I think more specificity is good in this case- the 'Requests for scripts' can be subcategories of the 'words needing attention'. If there are many words needing general attention then it becomes hard to find specific problems (like scripts). Just because someone knows the script doesn't mean they know the language. Nadando 22:54, 30 May 2008 (UTC)
I agree that something needs to be done. Although I know a lot of scripts, I try not to mess with languages that I don’t know much about. I do a lot of Arabic and can do some Persian when forced to, but Urdu really takes too much effort, and Pashto is worse. I check the request for Arabic script page regularly, but I don’t think anyone who works in a language other than Arabic does, so the Urdu goes unattended. It’s the same story with Cyrillic. I do a lot of Russian and some Bulgarian, but there are many languages that use Cyrillic and many more transliteration systems, so it can be very difficult to retransliterate words in unfamiliar languages. I think it would be nice to list requests for Arabic script work in a central location as it is done now, but also on a page for the specific language, and that page should include a link from the "Category:XXX language" page so that the relevant contributors can be made aware of it and can find it. —Stephen 08:06, 31 May 2008 (UTC)
I like the current setup; personally, I've never minded having to subvert it a bit when necessary (e.g. adding {{la-attention}} to cutify so a Latin-speaker — EncycloPetey, as it turns out — could fix the etymology, or adding {{rfscript|Cyrl}} to various entries that had the Cyrillic but needed someone — usually Stephen — to add a transliteration). However, I'm also quite fine with a setup that comes pre-subverted. :-) —RuakhTALK 15:08, 31 May 2008 (UTC)
Should modify {{rfscript}} to accept two arguments? One could identify the script and categorize it accordingly, but the other could identify the specific language (when known) and add it to the appropriate language attention category. That way, Urdu entries show up in an Urdu-specific category as well. --EncycloPetey 20:35, 31 May 2008 (UTC)
I think that's a great idea. :-) —RuakhTALK 13:00, 1 June 2008 (UTC)