This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit

Earlier years






















Breaking news about Wonderfool on Wikipedia

It turns out that w:User:Robdurbar is a sockpuppet of Wonderfool. See w:Wikipedia:Requests for checkuser/Case/Robdurbar for the evidence.

Due to these results, a community ban proposal has been made at w:Wikipedia:Community sanction noticeboard#Shocking news, and community ban proposal. I am inviting others who have had experience in dealing with this Dr. Jekyll/Mr. Hyde vandal to add your comments to the community ban discussion. Jesse Viviano 16:05, 1 May 2007 (UTC)[reply]

Sorani Kurdish Language

I found the Kurdish Wiktionary in Kirmanji only which is written in Latin letters. I tried to type in many words in Sorani Kurdish, written in Arabic letters, but could't for several reasons:

  1. There is no Sorani Wiktionary yet. Is it possible to open a Sorani Wiktionary? I could contact some Kurdish linguists who can translate and help.
  2. It is very difficult typing in Sorani. I checked out the Farsi for exmples and found some help at the bottom of the editing page (also found in the English editing page when selecting Arabic from the drop down box, there are six Persian letters).

Please let me know if creating a Sorani Wiktionary is feasable.Gbeebani 05:25, 2 May 2007 (UTC)[reply]

(Sorani alphabet just for reference...—Stephen 14:44, 2 May 2007 (UTC):[reply]

ی , ێ , هـ , وو , و , ۆ , ن , م , ل , ڵ , گ , ک , ق , ڤ , ف , غ , ع , ش , س , ژ , ز , ڕ , ر , د , خ , ح , چ , خ , ت , پ , ب , ئـ , ا

Stephen, can you drop those into MediaWiki:Edittools, or is this too rare to be needed here? --Connel MacKenzie 15:19, 2 May 2007 (UTC)[reply]
I would consider Sorani to be the principal Kurdish dialect, so it should not be too rare. The question is whether to put it under Sorani or under Kurdish. I suppose Sorani would be best, since some other Kurdish dialects prefer Roman or even Cyrillic letters. —Stephen 16:56, 2 May 2007 (UTC)[reply]
I added Sorani to MediaWiki:Edittools, but I can’t get it to display correctly. The name Sorani does not appear, but all the languages from there down are shifted one place: "Spanish" displays Sorani; "Turkish" displays Spanish, and so on. —Stephen 17:28, 2 May 2007 (UTC)[reply]
You have to keep MediaWiki:Monobook.js#addCharSubsetMenu in sync. Robert Ullmann 17:37, 2 May 2007 (UTC)[reply]
  • You are right about the Spanish. But since Persian is displayed under Arabic, would it be easier to diisplay the Sorani under the Arabic as well?
  • Also, I hope to get an answer on the feasability of creating a Sorani Wiktionary (One that the menu and sidebars are displayed in Sorani and not Kirmanji). I realized that Norsk language is displayed twice so I would assume that Kurdish needs the same: Sorani and Kirmanji. Or simply having a link at the top of each page where you can switch the view from one dialect to the other. My dad has a Sorani-English dictionary that he asked me to input into Wiktionary and I'm anxious to start.Gbeebani 18:26, 2 May 2007 (UTC)[reply]
I've also added the ڒ "ra with a hachek" and the regular Arabic "ya" to the drop down menu. --Dijan 20:50, 2 May 2007 (UTC)[reply]
Also, I've noticed a little error with the "he" that is used here for the Kurdish. I'm referring to the "connecting" "he". In practice Kurdish uses the normal Arabic "he" for the connecting one. Why is the initial, medial-only form (a separate Unicode point) used here? The problem occurs when someone tries to copy a word containing the special initial, medial-only "he" that is currently displaying in the drop down menu, because it does not show as a connecting letter at all in non-wiki environments. It simply displays as a "medial" he and it does not connect at all any surrounding letters. I realize that there might be confusion between the "initial, medial" "he" and the "final-only" "he" if using Arabic character because both will appear the same ("initial") in the drop-down menu. Is there a way to place both in there, but to point out that they're different? --Dijan 20:56, 2 May 2007 (UTC)[reply]
Thanks for your efforts Dijan, you are correct about the "he" letter. the initial and medial "he" are there ه‍ـand so is the line "he" ە. But there is no final "he". I cannot write my name گەنج correctly because the "he" does not connect with the گ.
Also, the letter ء is missing from the list. I can type it because I have an Arabic keyboard, but it would be a good idea to add it to the list. Example: نەء. Notice the "he" again.
Another problem I noticed is the font size is way too small. I can easily read the English letters but not the Kurdish. If someone can increase the font size one or two points then it should be more readable.
As for size, just use the Kurdish font template: {{ku-Arab|گه‌نج}}. I changed the way هـ ("he") works, but now it looks just like the next letter, which is "ae". —Stephen 22:52, 2 May 2007 (UTC)[reply]
I've changed the order of the two "he" letters. I've moved the "ae" sounding one after "d" so that it is distinguished from "h". --Dijan 22:57, 2 May 2007 (UTC)[reply]
the {{ku-Arab|گه نج}} is great, is there a similar template I can use to change th font and make it look like the one displayed from the drop down menu? like گه نج. Also, what I meant about the medial "he" is that when I write my name گەنج or using the other "he" گهنج or using the second "he" with a space گه نج are all incorrect. In the first example, the "he" is not connected to the گbecause it is a line "he" as in رەنج. The second example is a "he" that can be initial or medial depending on where you type it, as in بەرهەمهێنەر. the third one is incorrect because I have to put in a space after the "he" which makes the word strech too long and difficult for the eyes to read. When typing in MS Word, I use the unicode font Ali Web Malper which reduces the space between the letters and makes it much easier to read. The wikipedia article on Kurdish alphabet has a link to the website that you can download the unicode fonts from. Gbeebani 00:18, 3 May 2007 (UTC)[reply]
I don't speak Kurdish or know the Arabic alphabet, so I may be misunderstanding what you need, but what you probably want to do is put a zero-width no-break space (U+FEFF) after the "he". To insert that character, type  (ampersand-pound-sign-x-F-E-F-F-semicolon). (If that does what you want, then we can see about finding an easier way to input it here.) —RuakhTALK 02:35, 3 May 2007 (UTC)[reply]
Nice work, Stephen, Robert and Dijan! Ruakh, I though we were never supposed to use the HTML expressions that way (as they foil the WM search, always.) --Connel MacKenzie 02:47, 3 May 2007 (UTC)[reply]
Actually, in Persian and some other languages of that area that are written in Arabic, some of these are actually part of the script. In Persian and related languages, ‌ (zero-width nonjoiner) is required in many words. For example, کس‌کش, which without the ZWNJ would become کسکش, an unacceptable misspelling. —Stephen 03:09, 3 May 2007 (UTC)[reply]
I agree with Ruakh that the zero-space character will do what Gbeebani wants. It will display the "he" correctly and is used on Persian wikis. However, the only thing I have against it is that it becomes part of a word. It becomes part of the URL. I would hate to type it in only to see the word display correctly, when I already know what letters the word is composed of. The zero-space character is used in Persian to connect two words and make a single construction. This is especially true when adding suffixes and prefixes, where "he" has to remain in the final position so as to be recognized as "e" and not as "h". I don't believe that it has any purpose in Kurdish, or if it is even used. I understand that the present "he" character, like Gbeebani said, does not connect at all. I believe that is the flaw of the Tahoma font used in the KUchar template. Tahoma does display Arabic characters in a cleanest way possible to read, however. I think we should just test various popular unicode fonts until we find one that is compatible with displaying "he" while at the same time it is as clean as Tahoma. I've included a font called "Unikurd Web" into the list. I will try to use that one as my primary and report the results. If this works, it is not to say that others without the font will not question the problem. --Dijan 03:08, 3 May 2007 (UTC)[reply]

Of course (in case this hasn't been made clear) the English Wiktionary is a dictionary of words in all languages, and so we would welcome Kurdish entries here as well. If the new Wiktionary creation process is a daunting undertaking, don't let that scare you away from expanding our rather small Kurdish vocabulary at the English Wiktionary. Dmcdevit·t 06:25, 3 May 2007 (UTC)[reply]

Of course Dmcdevit, Although I just posted this to Dijan, I think I shoud post it here,too. I tried using firefox for editing, but the Sorani Kurdish is not available in the drop down box. I use firefox because it is easier to listen to pronunciations with it as I don't know how to make IE7 open Ogg files.
Thanks for sending me all the editing help links Dmcdevit, now I can finally use them.Gbeebani 07:04, 3 May 2007 (UTC)[reply]

question on year-old business: straight vs. curly apostrophe

When I logged in I saw that I had a new message. It was this on my talk page:


Hello, and welcome to Wiktionary. Thank you for your contributions. I hope you like the place and decide to stay. [...] If you have any questions, see the help pages, add a question to the beer parlour or ask me on my Talk page. Again, welcome!

Please don't start moving entries with different apostrophes. The subject is being discussed in the Beer parlour. Thank you. — Vildricianus 16:59, 5 April 2006 (UTC)

I haven't logged in for a long time, I guess. Evidently he was referring to something I did on 5 April: I made "o'clock" consistent with the other members of the English contractions category by changing its non-ASCII curly apostrophe (I believe it was U+2019, Right Single Quotation Mark) to an ASCII straight apostrophe (U+0027 Apostrophe), like all or the other words on that page. However, I can't find anything relevant either here or in the Archive for that month. Vildricianus's talk page says "On permanent wikibreak. Writing some ambitious stuff.", so I can't ask him that way, so I'm taking the alternative he suggested: here.

Has the matter been settled since then? The closest thing I can see is Wiktionary:Spelling variants in entry names. But this is not a variation of spelling, capitalization, or punctuation, but of encoding. Nobody distinguishes straight and curly apostrophes in English, only in computer encoding. My change was a matter of consistency, and I notice that it has not been reversed. So can someone please explain what, if anything, was the matter with my action?

Thnidu 01:52, 4 May 2007 (UTC)[reply]

Look at any (printed) book, and you’ll find that it uses curly apostrophes (an exception is the Lojban language). This is why I began to change some straight apostrophes to curly apostrophes. But I got the same comment, and I stopped... Lmaltier 10:06, 4 May 2007 (UTC)[reply]
We have to use straight apostrophes in entry names. This is a technical issue with searches and a few other things. So it must be at o'clock. If you want to make a link prettier by using an alternate rendering, that is generally okay. It is a bigger issue for other languages, where the straight apostrophe is wrong, but we use it anyway, then link alternation is important. Thnidu, your April 5 2006 move was correct—as you note it is still there—but it was under some discussion then. Robert Ullmann 12:19, 4 May 2007 (UTC)[reply]
I say: let’s enter the Unicode era. But there seem to be problems using the curly apostrophes (don’t ask me what they are). My solution to this has been to always use the curlies, and make a redirect page to the page with straight quotes. I think this way we satisfy all needs. H. (talk) 15:05, 24 May 2007 (UTC)[reply]


Where is Werdnabot? This page hasn't had an archived section since the middle of April? Last seen 17 April ...

Does anyone know how to contact Werdna? There are user pages all over (of course ;-), but where is the one he reads? Robert Ullmann 12:27, 4 May 2007 (UTC)[reply]

Um, yipes? w:User talk:Werdna! --Connel MacKenzie 04:53, 5 May 2007 (UTC)[reply]
Perhaps we could ask its clone, User:Shadowbot3, to archive our stuff instead. — Beobach972 05:59, 5 May 2007 (UTC)[reply]
Given that Werdna's user page there is gone, and the talk page itself is/was being archived by Shadowbot3 ... ? Robert Ullmann 14:40, 7 May 2007 (UTC)[reply]


I've noticed a flurry of "Classifier" headings in ==Thai== sections, but I don't see a Wiktionary:About Thai that either discusses or describes it. Offhand, it looks sortof valid? --Connel MacKenzie 04:47, 5 May 2007 (UTC)[reply]

A Thai classifier is the same thing as the Japanese counter (助数詞), also called a count word or counter word, used when giving the number (or sometimes mass quantity) of an item. Also used in Chinese and Korean, and I think Vietnamese. At some point we might want to standardize the name for this POS (WT:AJA uses "Counter"), but for now "Classifier" is the name used for Thai in the linguistics literature. Robert Ullmann 14:05, 7 May 2007 (UTC)[reply]

This isn't right

The ordinals tredecillionth through novemdecillionth each have three definitions and collectively fewer than three good quotations among them after sitting on RFV for over four months. Judging from Google Books hits and the like, they could sit another eight months and probably wouldn't have but a quotation each, on average, that would count toward our critera. By those criteria, the pages should be deleted outright.

The next larger ordinal, vigintillionth, is attested! So our criteria say that there should be a huge gap. By the way, all of the corresponding cardinals have been attested. So the only question is whether -th is legitimate or not. Our criteria are rather spotty on where they say "not". But wait, there's more! All of these were submitted in a batch of what must be a record 39 RFV's by a single contributor in the period of a day and a half.

Some figures

[Notably, besides the 16 numbers, 10 of the other words have now been fully cited, and of the remaining 13, 4 are in my opinion truly questionable still (seem real but difficult to cite), with various fates, and the legitimacy of 4 other terms (including compound words) could have been decided on other grounds. That leaves 5 credible nominations of completely erroneous entries, as I see it. In other words, 8 cardinals + 10 cited words = 18 which were correctly put up to RFV yet passed with flying colors, 4 questionable + 5 erroneous = 9 which were correctly put up and still have not been cited (although one was passed anyway for some reason), and 8 ordinals + 4 others = 12 which should never have been put up in the first place (some cited, most not).
Taking what I understand to be Connel's view, of the 23 that are not numbers, around 4 to 6 were passed that are still questionable or in some way illegitimate nonetheless, 5 to 7 somehow miraculously passed with legitimate quotations that materialize out of thin air, and 12 were deleted or can be soon, proving his prowess. In other words, 8 cardinals + 1 ordinal + 5 to 7 other well cited = about 15 which were passed legitimately, 7 ordinals + 12 deletions = 19 which were failed legitimately, and then around 5 which should never have been passed.]

The point is that, depending on whose position you take, the batting averages are completely different. My opinion, of course, it that my point of view is correct. ;-) Given the current size of the RFV pool, we cannot be attesting each and every variation of spelling, regular inflection, and the like, much less the regionality of each word or the degree to which a word is illiterate or what not. With a flooded RFV, there are two paths we can take at this point. One is to do a mass deletion and blame the original contributors for their lack of dedication. Fine by me, but then I'm not going to be the one to patrol the more ambitious contributors and their deleted-for-a-year pet words. I'll be very happy combing through the RFV failed entries for those with the most potential. Just reserve me the right to cry bad faith for flooding, e.g. too many clearly in widespread use.

The other option is to just let words like sextilian, undecomino, and the ordinals above go through automatically, and not RFV words like complexification just because you think they couldn't possibly be taken seriously. The point of RFV isn't to prove that the word is serious or not. And in my opinion, it shouldn't be anywhere near as demanding for regular, active production rules like -th. DAVilla 23:05, 5 May 2007 (UTC)[reply]

As a side-rebuttal to the question of approriateness:
  1. The "numbers" series had previously (Ec) been deleted outright.
  2. The "numbers" series was entered by sockpuppets of a single user.
  3. The "numbers" WT:LOP series was entered by that same single user.
If those aren't sufficient reasons to "question" (via RFV) the series, please let me know.
--Connel MacKenzie 05:08, 7 May 2007 (UTC)[reply]
Yeah, well, I remember EC deleting a page that had something like five votes in favor of keeping and one against, that being EC himself. I'm quite happy that there is at least a process now, even if it is flooded.
By the way, I'm making a distinction here between cardinals and ordinals. I don't blame you for nominating the cardinals... or the ordinals for that matter, although I would like to put forward this proposition. Since -th is an active production rule in English, and since the cardinals have been verified, can we take it on faith that the others are correct, based simply on a reference or the like to verify spelling, even though that's only mention? This is exactly the same test as recently proposed for gummy, singular of gummies. DAVilla 15:57, 7 May 2007 (UTC)[reply]
My objection is to the made-up "large" sock-puppet-added cardinal numbers, not the -th rule; on the other hand, normal numbers have no problem whatsoever attesting the ordinal forms. Perhaps a lack of attestation for the ordinals could be a test for which cardinal numbers should be marked with {{neologism}}? --Connel MacKenzie 23:52, 7 May 2007 (UTC)[reply]
Hey, I have no problem marking words that have been in use since the 1800's as neologisms, if that's what the community really wants. But... DAVilla 12:06, 8 May 2007 (UTC)[reply]

Category placement

(background: User_talk:Connel MacKenzie/Normalization of articles#20. controversial: Categories to bottom of entire page, not language section and User Talk:AutoFormat#Moving categories)

Question: should categories be at the end of the language section, or at the end of the entry?

Observations, end of entry:

  • some people have treated this as established standard (note it isn't written policy), routinely moving them to the end of entry
  • the pywikipediabot framework will silently move them to the end if any category op is done, unless one is careful to use the -inplace option (this is the standard on wikipedia)
  • there was no bot/automation that could place them at end of language section

Observations, end of language section:

  • this is the usual way most editors assume it should be done, and the way most cats in most entries are placed
  • allows cats to be edited with language section edits
  • the order of cats is visible in the entry; the list is in the order of appearance; since some are generated by templates in the language section, the others must be in the section to appear in the same alphabetical language order
  • we do have bot/automation now that can routinely place almost all cats in the correct language section

Before setting up a vote so we can write policy, comments? (and no snark about herding cats ;-) Robert Ullmann 14:29, 7 May 2007 (UTC)[reply]

I think that when possible, categories should be placed at what you might call their "logical" home — that is, at the same place that we'd put a template that included that category — and at other times, they should be placed at the end of a language section. —RuakhTALK 15:22, 7 May 2007 (UTC)[reply]
That can make them hard to find in long pages. I hate having to hunt to find a category placement that is stuck among the definitions (where it would be with a context template) or in the etymology section, etc. Since all categories are language specific, they should appear with the relevant language section. It makes most sense to have them collected at the end of the language section, so that editors can compare, edit, or amend them easily. --EncycloPetey 17:38, 7 May 2007 (UTC)[reply]
I have been placing categories at the end of the language section for a long time now. I think it is a good way to do it. -- A-cai 22:29, 7 May 2007 (UTC)[reply]
I think this is a good discussion. Please start a WT:VOTE to confirm the community (ahem, not my) opinion. As someone mentioned elsewhere, having a rule to follow is better then perpetually guessing. During the month-long vote, I won't move any more categories from language sections to bottom of the page. (BTW, why don't you have the same objection for interwiki links?) --Connel MacKenzie 23:46, 7 May 2007 (UTC)[reply]
I think because interwiki links are concerned with the pagename itself, whereas Categories are all language-specific. Widsith 10:24, 8 May 2007 (UTC)[reply]
Yes, the iwiki link for fr:foo is to the foo page in the fr wikt, not to foo in French; they are page references, and are all placed and sorted or removed by a bot. Robert Ullmann 11:51, 8 May 2007 (UTC)[reply]
End of language section, for sure. Otherwise you end up having to edit the whole page just to change the category, to move it into a template, to delete a language section outright, etc. DAVilla 12:01, 8 May 2007 (UTC)[reply]

See Wiktionary:Votes/2007-05/Categories at end of language section Robert Ullmann 13:21, 8 May 2007 (UTC)[reply]

Japanese transliteration

See Wiktionary talk:About Japanese/Transliteration ... IP-anon user is making two major changes, both of which I'm inclined to agree with, but neither of which have been discussed, excepting several other IP-anon comments. Would someone else look at this please. (Where is Tohru? ;-) Robert Ullmann 11:48, 8 May 2007 (UTC)[reply]

Wiktionary:About Japanese/Transliteration says at the top that wiktionary transliteration is based on Hepburn romanization. The Hepburn system uses the following scheme:
  • mizuumi (not mizuümi) lake
  • ōkii (not ōkī) big
I checked two Japanese dictionaries to verify the above:
  1. Japanese-English Character Dictionary, By Andrew Nelson →ISBN
  2. Sanseido's New Concise Japanese-English Dictionary →ISBN
I'm not sure where the ī and ü come from, but they are not Hepburn. -- A-cai 12:35, 8 May 2007 (UTC)[reply]

I got a comment from Tohru at Wiktionary_talk:About_Japanese#Wiktionary:About_Japanese/Transliteration pointing out that a rule in standardized romanizations is not to use a macron across morpheme boundaries, so 場合 would be ba (場) + ai (合) = baai, and 大きい would be ōki (stem) + i (inflection) = ōkii, just as 問う is to (stem) + u (inflection) = tou. However, looks like one morpheme (though it sounds like mizu + umi), so how would we determine whether to use a macron?

As for not using ī at all, that's reasonable -- let's just standardize one way or another.

I'm not sure whether the point of the diaeresis is to indicate pronunciation with the romanization, or to note that the contributor didn't forget to apply a macron, but either way it doesn't seem useful. Cynewulf 16:09, 8 May 2007 (UTC)[reply]

In the case of mizuumi, I think you answered your own question. Even though written with the Chinese character , the word itself is comprised of two morphemes: mizu water + umi ocean. I have not done any research on this, but it seems likely that the term mizuumi predates the arrival of kanji to Japan. As a side note, I wonder if the word came about because Japan is surrounded by oceans, in which case a large lake such as Lake Biwa might have been thought of as a fresh water (mizu) ocean (umi). Anyway, I digress. I agree with Tohru's analysis, although I must admit that I had never thought about it in those terms (probably because I started learning Japanese long before I knew what a morpheme was :) To me, the easiest way is to verify the spelling against any one of a number of Hepburn-based standard Japanese-English dictionaries (the one's I have support Tohru's explanation). -- A-cai 22:01, 8 May 2007 (UTC)[reply]
If you're going to sometimes use "uu", sometimes "ū", sometimes "aa", sometimes ā, (etc.), always "ii", it's going to be pretty confusing. If even some of you have to ask how it works, the newbie is going to be completely lost. Why not simply always use "aa", "ii", "uu", "ee", "oo"? I suggested as much at WT:AJA/Trans. talk, with a few more reasons. -- Coffee2theorems 00:26, 11 May 2007 (UTC)[reply]
Because uu and ū represent two different sounds. The ū represents a long vowel [ ɯː ] (ex. kūkō, [ kʰɯː.kʰoː ], airport), whereas uu [ ɯ.ʔɯ ] (ex. mizuumi, [ mizɯ.ʔɯmi ], lake) represents two vowels with a glottal stop in between them. Same thing for aa (ex. karaage, [ kaɺ̠a.ʔage ] replace g with ɡ, invalid IPA characters (g) fried), as opposed to ā (ex. I can't think of an example off the top of my head. Not all that common). In the case of ii, it is always ii and never ī in Hepburn. This is because i and ī can be difficult to distinguish. There are more rules, but I don't think it's necessary to list them all here. My point is that I think we should stick with a recognized standard such as Hepburn because it is much more widely recognized than anything we could come up with. I think wiki policy is to follow recognized standards rather than creating our own. This is why I'm advocating staying with the rules of conventional Hepburn romanization. As far as being confusing to beginners, what exactly is not confusing to beginning students of Japanese? :-) Besides, it's still a heck of a lot easier than English spelling ;) -- A-cai 11:28, 11 May 2007 (UTC)[reply]
I can't read IPA, but when I've heard 湖 pronounced, there's been no noticeable stop at みず・うみ, it's been みずーみ instead. Same for おーくりします (お送りします) you often hear on TV. At the very least the difference is sometimes slight. Wouldn't mizu-umi be clearer if you want to insert a glottal stop there? I doubt "mizuumi" communicates your intention to most readers. According to Wikipedia "tookyoo" is modified Hepburn (and JSL), so it's not an original invention. Also romanization isn't going to be exact representation of pronunciation no matter what you do, and you could do a lot more if you wanted (indicating pitch for instance).
Most of the time when I've seen Hepburn it's not been a single system, as everyone does some things slightly differently from others. If you look at the current draft here, or the rules they ended up with in Wikipedia, it's not any unmodified standard. Simplifying the vowel convention is surely a minor modification, and one that would suit Wiktionary well. Most people won't know how to type (say) shūchū in the search box, and even for those who do it's more work than typing letters you have on every keyboard.
This was discussed some time ago and we decided here that Japanese Romanizations should be handled the way we do with Latin and Old English. The long vowels, where applicable, are indicated in the page, but the title of the page is without them. That means that you would search for sayonara (no diacritics) and there you would find ===Interjection=== sayōnara (that is, until someone recently started changing our established practice and declared sayonara to be a misspelling). After all, the main reason for using Japanese Romanizations at all is to make it easier for English-speakers to look up Japanese words. —Stephen 23:37, 11 May 2007 (UTC)[reply]
  • For Mandarin words, I have been creating entries for both (ex. hanzi, hànzì). It is a little more work, but provides the person looking for the word with more options. However, there are times when you end up with something like zhidao. -- A-cai 00:40, 12 May 2007 (UTC)[reply]
Interesting solution. I didn't expect it, because WT:AJA/Translit. policy draft and its talk page don't mention it (and contradict it) and there are currently plenty of pages with macrons in their names. It certainly sounds like a good solution if you have to use macrons. The only drawbacks I can come up with are: you'd have to explain two (possibly already big) sets of otherwise unrelated words on the same page in some cases; how the word is typed by the user differs from what's on the page, which is counterintuitive; and it encourages people to use the macronless form as the romanization when they type elsewhere because it'd be seen as the best solution Wiktionary could come up with for typing.
You could also use the macron forms in page names, and additionally create a disambiguation page for every macron-containing word. This has the same drawbacks as the other solution except for the first one. An additional drawback is that one more click and page load is required to get to the correct word, but that's hardly a big problem. You could even use redirects (as much as they're hated here) in the unambiguous cases, as the romanization would neither be attested nor follow the (future) Wiktionary guidelines for systematic romanization.
Saying that an alternative form of romanization is a "misspelling" is strange indeed, especially as a misspelling in the Japanese language (and the "correct" Japanese romaji spelling would be in Kunrei if any, even if it's rarely used). -- Coffee2theorems 00:53, 12 May 2007 (UTC)[reply]
Actually, now that I looked at sayonara, it says さよなら in hiragana, so perhaps it means that that hiragana word is a misspelling of さようなら instead of the roomaji "sayonara" being a misspelling of "sayōnara"? I.e. that the writer really means the long form, さようなら, and romanizes it 'wrong' as "sayonara", vs. that the writer really means the 'wrong' short form, さよなら, and romanizes it 'correctly' as "sayonara". Regardless of the wrongs and corrects of romanization, both forms さよなら and さようなら exist in Japanese, it's not a misspelling. I'll change that entry. -- Coffee2theorems 01:53, 12 May 2007 (UTC)[reply]
Shouldn't write when tired.. The drawback "how the word is typed by the user differs from what's on the page" was meant for the second paragraph only, not the first, as in that case it's rather prominently shown in the page's title :-) -- Coffee2theorems 09:43, 12 May 2007 (UTC)[reply]
As Wiktionary is accessed on a computer, people will try to use the words they see in it on the computer, and will have problems with both typing them and communicating them whenever the character sets don't support them. I don't think Shift JIS and ISO-2022-JP do, though I'm not completely sure. They also won't work in much of Europe, which still has yet to switch to Unicode in many (most?) communication channels. Wiktionary:Transliteration (draft) also makes this good point: "In most or all cases, the romanization standard should contain no accents or diacritical marks, which makes it easy to read and look up a term by its romanization on systems where the proper characters are not supported."
You will also see that in practice people don't use macrons on the computer, except when they're making, say, a scholarly publication in English on the history of the Meiji period in Japan (more in the field of Wikipedia than Wiktionary, and a tiny minority use case). They will therefore have to cook up their own romanization method, and the most common ones they end up with are "simply ignore all the funny marks" style and waapuro. Providing one on Wiktionary that people can actually use which is not either of these choices (which I suppose everyone considers suboptimal) would be helpful.
Finally, I don't think that "beginners are confused anyway" is a good reason to make them more confused. Many simply give up once they see that the system is complicated and just learn the kana (and many recommend to start with the kana in the first place), and will then romanize using some random system (most likely waapuro). Witness the sheer amount of waapuro roomaji on the Internet. Also, for non-language-related things the "ignore all funny marks" way basically works because the vocabulary it is applied to is very limited and consists mostly of names, but with a dictionary it won't. Thus the viability of the most common escape path of those who think that diacritics are irrelevant (café vs cafe) or bothersome ("not on my keyboard or on this cafe's keyboard") disappears. Most likely they will not even notice and then they'll mix up all long and short vowels like they've always done (romaji, roomaji, who cares? but how about tori and toori, kori and koori, hori and hoori, ... it's a mess), resulting in chaos. -- Coffee2theorems 16:50, 11 May 2007 (UTC)[reply]
Coffee2theorems, when I put things like :) and ;) after a sentence, it is my attempt at humor, which sometimes gets lost when posting to message boards. My statement about "beginners are confused anyway" was not meant to be taken seriously. Anyway, a couple of points:
  • You will see it as mizuumi or mizu-umi, depending on which dictionary you use.
  • uu and aa vs. ū and ā: all I can tell you is that I do hear a difference between these two, although it is a very subtle difference that may be hard to pick out. However, I agree with you in the case of ii and oo that that they are almost always the long vowel (no stop in between). I used to be pretty fluent in Japanese, but now I'm a little rusty, so I would not consider myself the ultimate authority. Perhaps Tohru or another native speaker could weigh in on that one.
  • Tookyoo vs. Tōkyō: You are correct that there are some books that use the oo instead (ex. Learn Japanese: New College Text, →ISBN. However, in my experience ō is still by far the more common way.
  • diacritics and the computer. I understand your concern, but this is not an issue confined to Japanese. Any language which makes use of diacritics poses a problem for the inexperienced computer user. What I can tell you is that Wiktionary makes an attempt to compensate for this by offering a pull down menu just below the "save page" button. You can find just about any diacritic you want, simply click on the symbol and it will appear in the edit box (where ever your cursor is placed).
  • beginning students and dictionaries: it has been my observation that beginning students don't generally use dictionaries (regardless of language). For the beginning student, the glossary at the back of their beginning textbook is usually sufficient. It is usually not until they are more at the elementary to intermediate level that a dictionary comes into play. By that point, they seem to be able to work their way past inconsistencies in standards. Of course, I don't have any statistics on hand to back up my claim.
In summation, I only point out the above as an occasional contributor of Japanese words (I mainly work on Mandarin and Min Nan words, since I'm more fluent in those languages). I think we need to get some more opinions from the more regular contributors of Japanese before making any changes to existing policies. -- A-cai 22:07, 11 May 2007 (UTC)[reply]
The difference between Japanese and other languages that use diacritics is that the diacritic form in (most?) other languages is the one native speakers themselves use, whereas in Japanese it's just one romanization method out of many, and few people use it on the computer. Because of that it's not well supported anywhere, including in Japan. And because it's not the native form of the text, you get to choose a romanization method without diacritics for convenience if you want. You can't well go tell the French not to use diacritics, but the Japanese don't use them themselves, so there's no need to.
I think we need more opinions, too. The policy in question is just a draft and a bit peculiar one at that ("ōümi" is a type of romanization I've never seen anywhere), so it would benefit from discussion in general. As for dictionary use, I know I started using a dictionary immediately (and generally do), and have seen other beginners who've done it. The dictionary in question was Jim Breen's "edict" though, and that one doesn't have romaji at all, so I got a forced crash-course in kana (this is intentional by J.B.). I also ended up using Hepburn-like waapuro roomaji (i.e. "shi", not "si") because it was simple, worked everywhere, and I hadn't seen anything better (macron-hepburn worked almost nowhere and I couldn't get macron-characters with one keypress). To this day I sometimes use it accidentally, it's not a pretty sight :-) -- Coffee2theorems 23:46, 11 May 2007 (UTC)[reply]

re: ""diacritics poses a problem for the inexperienced computer user"".

Here we have the nub of the problem. Too many of our editors consider people who don't know how to use diacritics as "inexperienced computer users", and thus some sort of minority group who can be safely ingnored - they'll grow out of it or die out. Whereas, having worked in the computer industry for 28 years, I bet I couldn't find one person in the hundreds of very experienced computer users I know who give a damn about diacritic marks, let alone know how to or bother to use them. What we need to address is the ability of the average user of an English dictionary. And I'd bet a lot that the average user of an English dictionary does not know how to put diacritic marks in on a computer. Just stop kidding yourself that the group of language nuts currently around in Wiktionary is in anyway average of representative of the people who could and would make more use of a decent Wiktionary if it wasn't so bloated and hard to use.

Answer the question please. Is The English Wiktionary to be useful and easy to use for theaverage person who wants to use a comphrehensive and informative English Diktionary, or is it to be the domain of language specialists, inlcuding people who give a damn about words in the Yatzachi Zapotec language, and the diacritic marks some pervert has decided are necessary in words such as Beaṉ ?--Richardb 23:58, 12 May 2007 (UTC)[reply]

Richardb, your comments seem a little off point. We were talking about determining a standard for Japanese Romanization at Wiktionary. In keeping with Wiktionary policy, it is reasonable to advocate adopting a widely accepted standard. If that standard uses diacritics (as is the case with many languages, including Japanese), are you suggesting that we ignore the use of diacritics because a lot of people are uncomfortable with diacritics? If that is the case, I have to disagree with your premise. The average user to which you refer (monolingual English speaker?) probably doesn't know Japanese either. Why then would such a person find typing ōkii any more difficult than typing 大きい? Both would present challenges for a non-expert. I guess I do not quite understand your point. Are you suggesting that we abandon all foreign languages at Wiktionary and just focus on English? If so, it's far too late for that :) Alternatively, are you implying that searching for words with diacritics is too much of a pain for the average user? Now that, I can sympathize with (and no, I do not disregard such users)! I have taken steps in the case of Mandarin to address that issue (see above comment about hanzi vs. hànzì). Do you disagree with my proposed approach? It would help if you could be more specific. -- A-cai 05:40, 13 May 2007 (UTC)[reply]
I believe the the precedent set with Old English is to hard-redirect something like ōkii to okii and explain the form (as a soft redirect to 大きい) there. Personally, the only diacritic character I've ever memorized the whacko-alt key sequence for was "é", but with the prevalence of UTF-8, that has changed as well. There are a variety of ways people can cut-n-paste from elsewhere, or even turn on the WT:PREF to allow special characters; but in general those methods require a very significant technical know-how. Linguists, researchers and people with European keyboards obviously don't have as hard a time. But I strongly agree we should be directing our efforts more toward the readers of Wiktionary. --Connel MacKenzie 10:39, 13 May 2007 (UTC)[reply]
In my view, we should make an attempt to cater to both the general reader and the linguistics professor. This is why I advocate separate entries for ōkii and okii (or better yet, ookii), without redirects. If you simply redirect ōkii to okii without any explanation, you are implying that okii is every bit as valid a spelling as ōkii, which is not entirely accurate. In fact, the only reason for including okii in Wiktionary is that the "typical" user finds it inconvenient to type ōkii (the "correct" Hepburn spelling). Also, typing ōkii into the search box is only one method to find the entry. Hyperlinks from another page is another way. Another way would be to come across it on a page like this, then copy and paste it into Wiktionary's search box. -- A-cai 12:17, 13 May 2007 (UTC)[reply]
There are two cases: romanization for convenience of typing, and romanization people are likely to use. If you're concerned only with the former, "ookii", "tooi", "shuuchuu", etc. are surely better romanizations than "okii", "toi" and "shuchu". Will many people try to use forms such as "okii", "toi" and "shuchu" for non-English Japanese words? At least I wouldn't want to encourage such horrendous practice by redirecting people from "ōkii" to "okii", as they will surely be misunderstood or not understood at all if they use such things elsewhere. Also, it is not the case that macrons are available on all European keyboards (are they available on any?). -- Coffee2theorems 13:17, 13 May 2007 (UTC)[reply]

cafe and café

Here is a puzzle. Why do we end up with two English entries for cafe and café, with the potential to get out of synchronisation. Why doesn't cafe just have a note about it being an alternative spelling of café, and the user is then given a link to go to café, where the English and French (and other) explanations exist. Hey, I don't know. And on that note of ignorance I'll bow out of this particular debate. Except to repeat that the average English user (monolingual or otherwise) of this, the English Wiktionary, does not know how to use diacritic marks, no matter how experienced they are as computer users. And we would do well to remember it.--Richardb 14:56, 15 May 2007 (UTC)[reply]

My personal preference to go a long way to resolving all this would be to split this Wiktionary into two. One being the English Wiktionary, and one being the Multilingual Wiktionary. Then we could leave all you jolly linguists to play in your own sandpit, while we fans of an English dictionary could get on with developing an English dictionary to suit us, without carrying so very much overhead. (Just contend with the conflicts of American vs UK vs Commonwealth etc) And I really am serious about that suggestion. Seems the other Wiktionaries do pretty well without bothering to try to be multilingual. But, I don't expect we'll get our own English Wiktionary back until Ultimate Wiktionary is in place. More's the pity--Richardb 14:56, 15 May 2007 (UTC)[reply]

Richardb, other wiktionaries are multilingual. Granted, most are not as multilingual as en.wiktionary; but that is because they usually have a smaller user base. I actually do agree with you about café vs. cafe. As a contributor of Mandarin words, I deal with this problem on steroids! Because of current software limitations, if I want to be absolutely sure that a user can find the Mandarin word for foreign country, I would have to create separate entries for 外國, 外国, wàiguó and waiguo. These all represent the exact same word in Mandarin (but get counted as four different Mandarin words, which skew our stats!).
In the case of an English only dictionary, I think you know where I stand on that one already. I would just add that I think the English portion of Wiktionary has benefited from the presence of foreign languages on Wiktionary. Comparing English to one or more other languages will inevitably offer insights about English itself. Specifically, the definitions have become sharper for many words. This becomes necessary if you want to offer translations of different senses of the word in question.
I think what we're all saying is we need a way to store and retrieve the information in such a way as to conform to individual user preferences. If all you want to see is English, you should be able to set that in your preferences. I don't think splitting Wiktionary in two would be the best way to accomplish this. If you do that, you will surely drain momentum from one or the other. -- A-cai 10:31, 16 May 2007 (UTC)[reply]

Standardising region templates

There are {{US}} and {{UK}} templates, but some region are longer such as {{Australian English}} and {{Irish English}}. Is there any chance those longer templates could be changed to just the country name to standardise them? It does not affect many words, but it would stamdardise the templates. Pistachio 22:36, 9 May 2007 (UTC)[reply]

We do have {{AU}}, which redirects to {{Australian English}} —RuakhTALK 01:52, 10 May 2007 (UTC)[reply]
I am responsible for creating some of the English and Spanish regional context labels. I will have a look into standardising all the Regional label templates soonish. Soime of them a little to long e.g. {{Northern English dialect}} which I'd prefer to be Northern English (but there are some issues with that conufusing folks across the pond). Also UK - now theres an interesting one, will we seperate into British again since the Scots seem hell-bent on leaving the Union?--Williamsayers79 08:28, 10 May 2007 (UTC)[reply]
In that case we can use {{English}}, though that might be confusing for some :) Actually, in all seriousness there should be an English template because Scottish, Welsh and English usage does vary. Regarding the country templates, should "Indian English" and "Canadian English" become "India" and "Canada" in line with "US" and "UK"? Yesterday I added {{India}} when editing solicitor before I found the {{Indian English}} template, which is why I am asking about this issue. Pistachio 12:23, 10 May 2007 (UTC)[reply]
Also, I want to add that many words are tagged with "US" and "UK" but it seems for some reason Canadian, Irish and Australian (amongst other) usage of the same words is not recorded. That is a shame, as I know this is important information to a lot of people studying English overseas. Pistachio 12:29, 10 May 2007 (UTC)[reply]
This results from the fact that the majority of regular contributors are from the US or the UK. The best way to have more Canadian, Irish, Indian, and Australian usage notes is to acquire more Canadian, Irish, Indian, and Australian editors. Personally, most resources I have discuss only US & UK senses, though I have seen once a dictionary of Indian idiomatic usages. --EncycloPetey 15:40, 10 May 2007 (UTC)[reply]
Yes. I for one need an East African English tag of some sort to add some entries. Robert Ullmann 15:34, 11 May 2007 (UTC)[reply]

Categorization scheme for rhymes?

Is there any kind of formal categorization scheme for rhymes (in the sense that, for example, IPA and SAMPA are categorization schemes for sounds)? I'm thinking that there really should be some way to reference a rhyme other than by reference to a word within that category of rhymes. Cheers! bd2412 T 06:20, 10 May 2007 (UTC)[reply]

I'm not sure I understand what you're asking. Within the Rhymes pages, the first subdivision is by primary IPA vowel sound of the stressed syllable (e.g. /ɪ/). Each such IPA sound has a page on which the various rhymes beginning with that sound are listed (e.g. /ɪd/, /ɪn/, /ɪŋ/ invalid IPA characters (////), and these are listed in a table alphabetized by the English standard spelling equivalent for those sounds. Each has an example word given next to it. Each also links then to a page listing words that rhyme. So, when I want to know whether a particular rhyme page exists, I can scroll to the appropriate section of the list and look. --EncycloPetey 15:34, 10 May 2007 (UTC)[reply]
Well I see what you mean, but it's not a particularly user-friendly system - you have to be familiar with the IPA system going in to get any utility out of it. bd2412 T 18:39, 10 May 2007 (UTC)[reply]
As the person who has entered almost all of the rhymes, I should explain.
There has to be some way of ordering the words, and since rhymes are concerned with the pronunciations of words, not their spellings, this has to be by sound rather than by spelling.
At the moment, the ordering of the links for each vowel is approximately alphabetical by the commonest spelling of the sound; for example, the page for /æ/ has /æn/ (commonly spelled "an") followed by /æŋ/ (commonly spelled "ang"), /æp/ (commonly spelled "ap"), etc. The ordering is not very precise, and cannot be, because some sounds have many possible spellings (eg, /ə/; I tend to treat this as if spelled "a", as in "kinda").
The links for each vowel are currently sufficiently few that they can all appear on one page. As these lists get longer (and, in theory at least, they could become much, much longer) some subdivision scheme will be necessary. I think that EncycloPetey's scheme could be done, provided each links to the subpages also spelled out the commonest spelling or spellings of the IPA character(s) following the vowel, or maybe with example words, as each link currently has. Hence the links mentioned above would appear as /ɪd/ (as in "kid"), /ɪn/ (as in "kin"), /ɪŋ/ (as in "king").
BD2412, note that there is a bot that periodically (though I don't know the last time it was done) scans the rhymes pages and adds a rhymes link to the entries linked to from them. So if you want a rhyme for kid, kin or king, you can just go to those pages and follow the rhymes link in the pronunciation section to go directly to the right page, rather than having to scan through a long list of links. — Paul G 15:05, 15 May 2007 (UTC)[reply]

Order of definitions

A very long time ago, the order that definitions appear was discussed inconclusively.

In the last two years, other mirror sites have started using Wiktionary content in new and creative (useful) ways. This is a Good Thing.

Some of them use only the first definition, others use only the first definition within a given "part-of-speech" heading. Some limit definitions to English, others don't filter, but instead reformat all content.

In light of some of the (legitimate, good) derivative uses, I think it is time to reopen discussion on this topic. Can someone point to the existing verbiage we have in our current policy scheme? Offhand, I recall only one policy page that might discuss it, and haven't searched the archives yet, to glean other considerations on the topic.

My general inclination is that we should think about rewording it to emphasize that the primary definition should be listed first (if we don't already.) Perhaps we also could devise criteria for what we (Wiktionary) mean by "primary sense" in this context.

--Connel MacKenzie 22:40, 10 May 2007 (UTC)[reply]

I'm not sure any tool that uses only the first definition can be very useful, even if the first definition is the most common, because sometimes the first few definitions are all very common. At least, I can't imagine a use for a dictionary tool that knows what the nouns theosophy and reification are, but only has one sense for each of the nouns bank and pen.
I don't particularly object to putting primary senses first (as you suggest, that would open a can of worms in deciding what the primary sense is, but we have enough cans of worms open that I don't think one more will be a big deal), but I think it's more important that we be aggressive in using context labels like (dated) and (rare), as well as like (US) and (course), so that a tool that's interested only in currently-common words and senses or only in words and senses you'd find in polite British English, or whatnot, can examine the tags and filter accordingly. Heck, someday Wiktionary itself might allow such user-specified filters.
RuakhTALK 17:45, 11 May 2007 (UTC)[reply]
Earlier I've mentioned the possibility of having multiple sort orders. Obviously due to software limitations Wiktionary itself isn't yet at the point to implement this, but we can certainly already include it in the defs for editors and 3rd party users of our data to make use of. We can make templates which create no output or maybe just an HTML comment but which people writing parsers can make use of. How about something like this:
 # {{ord-hist|1}} {{ord-mod|2}} (obsolete) a repairer of lutes and lyres
 # {{ord-hist|2}} {{ord-mod|1}} (slang) a computer hacker
My apologies for not being able to think of a real world example. ord-hist would be the historic order, ord-mod would be the modern order by usage. Others would be possible. — Hippietrail 18:24, 11 May 2007 (UTC)[reply]
  • Oh and by the way, just a bit of nitpicking. It's innacurate to call all labels context labels. Context is only one such case. There are actually several types of labels: region (US, UK), register (formal, poetic, colloquial, slang, technical), and I can't remember the correct term for the group (dated, archaic, obsolete, historic). There's probably others. Look in the front of your favourite big print dictionary - they all have a section explaining their labels.
  • So what would the general term be? Sense labels? I call them context labels not so much because they necessarily describe the context (though I think I would consider regional context, language register, and historical context to be kinds of context) as because we include them using the {{context}} template. —RuakhTALK 20:56, 11 May 2007 (UTC)[reply]
Likewise discussion at {{context}} suggested renaming. To what? DAVilla 23:05, 11 May 2007 (UTC)[reply]
That is an interesting approach. I was thinking of the opposite, though...something to make external parsing easier, not harder. On another note, for a minor simplicity nit-pick, I'd suggest {{order|hist=1|mod=2}} or {{order|1|2}}. --Connel MacKenzie 10:26, 13 May 2007 (UTC)[reply]

(I know this has taken a long time; I spent a good deal of un-anticipated time re-structuring code to allow the level corrections to conform to WT:ELE to be turned on or off, and fixing the resulting side-effects; and there have been other interesting discoveries, such as the number of entries that start with Etymology at L4 ...)

Please refer to the documentation at User:AutoFormat

AutoFormat fixes several classes of common syntax and structure errors common in new entries and in existing entries that have not been corrected. These include entries that have not been corrected in the past. Some of the corrections are simply wikitext spacing that is preferred; some affect the visible appearance of the page. Examples are correcting misspellings in headers (Pronounciation) and correcting headers to sentence case (Related Terms). It is also updating some template calls.

Two classes of action have been suspended/excluded for now:

  • moving categories to the appropriate language section(s); policy vote pending
  • correcting levels of headers to WT:ELE standards for L4; vote TBD

Before setting up a vote, I'd like comments, so it can be set properly. (And please see User Talk:AutoFormat). Robert Ullmann 15:54, 11 May 2007 (UTC)[reply]

AutoFormat really is excellent. As it represents a pretty new approach, perhaps the WT:BOT requirements should be bent for it. The reason I suggest that, is that I would really like to see individual debates on the different aspects it auto-corrects. If they are all lumped together it would never pass...but I've seen that individual (possible contentious) items can be turned off individually. So I think it would be better to have one blanket approval for the technique, with some "less formal" discussion/approval process for the idividual items. (Much like I was surprised at User:TheCheatBot's approval process, I never dreamt the items that ultimately proved to be the most contentious, were even questionable when I started with it all.)
Thank you again, for all your hard (and excellent) work on AutoFormat. --Connel MacKenzie 17:57, 11 May 2007 (UTC)[reply]
Thank you. I do think there is a basic class of stuff it does isn't going to be in any way contentious: most spacing, rules between languages, fixing spelling and sentence case in headers as I mentioned above. As you say, it can be surprising, but AF has been doing a number of things very visibly without any complaint (gender templates in trans sections, unlinking top40, ...). Which is why I'm asking. Can always set a vote and then withdraw/restart if there are other issues? Robert Ullmann 13:45, 12 May 2007 (UTC)[reply]
Historic note: I don't recall how many votes/revotes there were for TheCheatBot et al. I think it was 7. Anyhow, I didn't start it putting it together until I was personally convinced there was nothing contentious about any of it. So, it was quite a shock to have vote after vote turn into colorful discussions. And if you count the "final" votes for each of the separate bots separately, it would be a total of about twelve votes. I do not want to see that happen with your bot! Without a "sub-item" vote mechanism, the bot approval will very likely fail. And that would be tragic. --Connel MacKenzie 10:19, 13 May 2007 (UTC)[reply]
There are some serious differences in the "environment": for example TheCheatBot explicitly avoided using a template for plurals, and a goodly amount of the discussion was about the format of the generated plurals; with {{plural of}} firmly established, none of that would have been an issue. In any case, I'm not going to run 7 rounds of voting; try once, and then should know where we are. There are two "sub-items" that people wanted to consider separately already broken out. If we identify more, okay. Robert Ullmann 12:50, 15 May 2007 (UTC)[reply]
The latest incarnation of AutoFormat seems to work quite well. It even leaves notices about problems that it recognizes may not be satisfactorily resolved. While it may lead to more formatting policy discussion as a result of its efforts, I see this as a positive outcome. --EncycloPetey 01:43, 15 May 2007 (UTC)[reply]

Vote set at Wiktionary:Votes/bt-2007-05/User:AutoFormat.

(Why does the template say 2 days or one week? how can a two day vote ever be useful?)
Connel, if you'd like a colourful discussion on some other sub-item, by all means start one! ;-) We'll see how it goes. Robert Ullmann 13:23, 15 May 2007 (UTC)[reply]

Does AF touch pages without making visible changes, for follwing the conventions of spacing only, or does it hopefully skip those type of edits?

I could think of a lot of other formatting tasks that AF could do, but I'm not sure how much support they carry. DAVilla 19:53, 15 May 2007 (UTC)[reply]

AF doesn't make an edit only for "minor spacing"; that is things that don't affect the page rendering. But if it has to remove the {{rfc-auto}} tag anyway, it does. Things considered "minor spacing" include fixing category: to Category and so on. See the doc.
Any ideas, please add as sections to the talk page! We can talk ;-) Robert Ullmann 23:59, 15 May 2007 (UTC)[reply]

This word, with it's special affection of the underscore on the n, raises a question. If you check Ethnologue entry for Beer parlour/2007/May, zav  ⁠ you'll find Yatzachi Zapotec is spoken by an estimated 2,500 people, largely illiterate.

My guess is it would not really passs the criteria for inclusion. Any challenges to that ?

My view is that the inclusion of this word in Wiktionary, and so many thousand like it, greatly degrades the usefulness of this dictionary as an English dictionary.

Maybe it's time to consider converting this Wiktionary to an "International" Wiktionary and restart a real English dictionary, without these hundreds of thousands of words from other languages that just make it so difficult to use this Wiktionary as an English wiktionary.--Richardb 23:39, 12 May 2007 (UTC)[reply]

When you raise an issue, please post it in one location instead of three. I have already replied at one of the other locations where you commented. Our basic tenet is "All words in all languages". --EncycloPetey 00:08, 13 May 2007 (UTC)[reply]
While I think en.wiktionary would have benefited enormously if it had stuck to English, the multilingual aspects have too much momentum to simply fix it, now. The two obvious directions for working-around the limitations of the multilingual dictionary are: 1) Splitting out ("forking") an English-only project separate from, and 2) Adopting Hippietrail's language separation software. --Connel MacKenzie 10:09, 13 May 2007 (UTC)[reply]
Richardb, how in the world does including a non-English word (even an obscure one) degrade Wiktionary as an English dictionary? That doesn't make any sense at all. I doubt you would ever even see the word Beaṉ on Wiktionary unless you went out of your way to look for it. Let me guess: your "typical" English speaking reader somehow doesn't know the meaning of the English word bean, looks it up, and is utterly confounded by the "see also" line at the top; is that what you're saying? I'm sorry, but are you kidding me? Simply put, English is not a pure breed; it is a mutt. It is an inclusive language, which is one of the reasons that it has become the dominant international language. In fact, did you know that only about 25% of English words actually are descended from English (see w:English_language#Word_origins)? Wiktionary is the only dictionary where you can look up rendezvous, link to its French ancestor rendre, and then link to the French word's Latin ancestor reddere. I can do all of this without consulting three different dictionaries. Can you do that on an English only dictionary? I don't think so! In my opinion, Wiktionary would have been largely ignored had it been confined to English only. There are plenty of other online English dictionaries already :) -- A-cai 13:32, 13 May 2007 (UTC)[reply]
I would like to second A-cai's comments. I fail to see how Beaṉ in any way gets in the way of looking up bean. Seems to me that, what with English entries always at the top, foreign words are rarely a hinderance. If you're arguing that our efforts are divided, then perhaps. However, I would not be working on this project were it an English-only one, and I imagine that many of our foreign language contributors feel the same way. Finally, I am a huge fan of Hippietrail's new software, and am curious as to how we would go about incorporating it into Wiktionary. Atelaes 01:28, 15 May 2007 (UTC)[reply]
How does it degrade the use of Wiktionary as an English Wiktionary ? Simple. Whenever I do a search, or a list of categories maybe, or similar, the search results or lists are enormously over populated with so much stuff that is not English, it makes it very hard at times to find the actual word or expression or combination of words that I am looking for. (I was actually looking for the expression "hill of beans". No, I was not neding to look up the meaning of the word "bean" :-) )
We need a software answer. The user should be able to specify, in preferences, which languages he is interested in, and from then on he should see only those languages. Problem is, we are never going to get that software courtesy of Wikipedia, 'cos Wikipedia doesn't have Multilingual versions. Very basic problem. Ultimate Wiktionary is our only hope ! But can I hold my breath that long ?--Richardb 15:23, 15 May 2007 (UTC)[reply]
I wholeheartedly agree that users should be able to specify their language (among other things) and never be bothered with foreign language entries again, if they so desire. Again, how do we git Hippietrail's stuff working here? Atelaes 21:00, 15 May 2007 (UTC)[reply]
Where is "Hippietrial's stuff" described. We'll need that before we can start a vote on it.--Richardb 05:26, 24 May 2007 (UTC)[reply]
Start a one or two month WT:VOTE for it. --Connel MacKenzie 04:00, 16 May 2007 (UTC)[reply]
I think I understand what I happened. Perhaps you typed hill of beans into the search box but did not find it (because the entry was not created until 13 May 2007). Failing that, you went to Special:Allpages/hest to see if you could find something close to it, but all of the foreign words made the list much longer? If that is what we're talking about, I agree that a software fix could be the answer. For example, in addition to Special:Allpages, we would need something like Special:AllEnglishpages, Special:AllFrenchpages or something similar. Alternatively, you could have Special:Allpages, but with an extra filter for language (based on L2 language headers). -- A-cai 22:21, 15 May 2007 (UTC)[reply]

Rubies (furigana) in Japanese

Could we have a template (and a mention in WT:AJA) for rubies in Japanese text? Often the original text in a quotation has rubies and they should be reproduced in order to be faithful to the original written form. Where and how often rubies are used immediately tells something about the target audience and thus the text itself, and some rubies are even impossible to determine from the text they're attached to because they're used as a stylistic device or to provide additional meaning. Furigana is also the standard way (nothing else is ever used in running text) to provide readings in Japanese.

Some days ago I added a few quotations to 映る. (Note: later I read WT:QUOTE and noticed that quotations should be from printed publications, but that's a separate matter) When I did that, I noticed that long kana strings are (unsurprisingly) not easy to read nor do they look nice. Here's one quotation with links added for each kanji word:



In addition to this there's translation, and should also be romaji (which I didn't add). Having romaji and the original text is fine, but this intermediate kana stage is superfluous. The romaji already tells how to read the text. The kana here doubles the amount of Japanese text in each quote, and if it's used to give the kana from the original rubies (here the original had none, but often that's not the case), you have to read the text twice to get the original representation. The standard way is to use rubies, and when they're not available on a computer, they're usually rendered like this (I'll call this parenthesis-ruby):


To me this looks worse than just giving kanji only, but better than giving both kanji and kana on separate lines. Parenthesis-ruby works best when it is done for the unobvious cases only, but it's not impossible to always use it.

If you use the ruby tags like b:Japanese does, browsers which support them will show you the rubies rendered nicely, and other browsers will show parenthesis-rubies. That would be, if not now, at least in the future the best solution (though I don't see why not now). By using templates like they do (b:Template:Ruby), you could switch between various representations as necessary, and the information would always be there in easily processable format.

I was curious what a table implementation of such a template would look like, and rendered the text above using tables. This doesn't look entirely satisfactory (I just skimmed a Wikipedia tutorial on tables, so..), but even so, it looks better than the other options above:


Besides the technical matters, there's also the question of how to distinguish rubies that were in the original text from those added by a Wiktionarian. This could be done e.g. by marking any modifications/additions with [] like is usual in English quotations. Or one could simply always reproduce the original text faithfully and rely solely on romaji for explanation of the reading - why give it twice in different formats? (applies to quotations only, elsewhere kana must of course be given, as the orthography can't be back-derived from romanization).

OK, that was a pretty mixed bag of ruby-related things. Thoughts? :-) -- Coffee2theorems 01:01, 13 May 2007 (UTC)[reply]

As there was no answer here and the source for a quotation I was adding had furigana, I was bold and added the template Template:ja-furigana and used it at お気に入り. I haven't made templates before, so I'm not sure if the code is good or not. -- Coffee2theorems 22:53, 16 May 2007 (UTC)[reply]
Take a look at the entry now. I think it is more like what you were after. However, I would recommend using romaji as the furigana instead of hiragana (technically a no-no if this were a Japanese publication). That way, it will be easier for the non-Japanese speaker to follow. -- A-cai 23:13, 16 May 2007 (UTC)[reply]
The links are a bit problematic now. It links to the kanjis, but a link like 冷える is more useful than える (or would be if someone had created the entry at 冷える :-). Enlargening the font for only those parts that have furigana is a bit unaesthetic, too. The furigana I put in is taken directly from the original source, so I don't think romanizing it would be a good idea, as that would misrepresent the text. Besides, there's already one romanization for the non-Japanese speaker. I don't see the need for another one, especially as the parts of the line without furigana would still be in kana. -- Coffee2theorems 23:48, 16 May 2007 (UTC)[reply]
I modified my template, copying some ideas from the template you used (Ruby-ja), including the use of ruby tags. AFAIK they work only on IE and are an XHTML 1.1 feature, not XHTML 1.0 (which makes validators complain, as mentioned on the Wikipedia talk page of the Ruby-ja template - it's only used in one place in Wikipedia). Because of these reasons, I didn't originally put them in the template, and I'm still not sure if they're OK to use (they do work in practice). Here's a comparison between Ruby-ja and ja-furigana, with お気に入り bolded as per WT:QUOTE:
 (あさ)はまだ ()えるので、さちは () () (あか)マフラー (くび) ()いている。
ja-furigana within <span style="font-size:large;"></span> tags to match size with ruby-ja kanjis:
 (あさ)はまだ ()えるので、さちは () () (あか)マフラー (くび) ()いている。
I looked at these with both Mozilla and IE, and liked plain ja-furigana most, ruby-ja least. Reasons are linking to whole words instead of parts, bolding of the whole word お気に入り without links and consistent font size on the line. I also think the smaller font size looks better, as that's what's normally used for text. It would look even better in Mozilla if お気に入り didn't have furigana at all, but as the furigana is present in the original text, I didn't want to remove it. I think it's important to keep quotations as close as possible to the original. -- Coffee2theorems 01:41, 17 May 2007 (UTC)[reply]
I didn't realize that the furigana was part of the quoted text. In that case, I guess your explanation makes sense. I would say to keep working on it. After you've added enough quotes over time, I think you'll end up tweaking here and there based on different issues that come up. Once you're comfortable with the template (whichever one you go with), consider working it into WT:AJ so that others will know that it is an option, and how it should be used. Congratulations on a very nice innovation! -- A-cai 10:54, 17 May 2007 (UTC)[reply]
This case was rather simple, as the original had furigana for all kanjis. WT:AJA says that one should have the original version, hiragana version and romaji version. The hiragana version is omitted if there are no kanjis in the example except in the word the page is about. I omitted the hiragana version as it would be equally redundant because of the furigana.
I'm not sure what to do about cases where only some furigana is used in the original text. The easy way is to copy the furigana as it is in the original and then additionally have a hiragana version as in WT:AJA. The only "problem" with this is the same as with any text that contains only one or two kanjis - you have to have a hiragana version that only differs in a one or two places from the original. Using furigana for such cases too might be nice, but you'd have to indicate which furigana is from the original text and which one was added. Having to use hiragana is another thing that's not too pretty. Examples:
Current WT:AJA way:
shikashi, amerika ni wa ika no futatsu no fīto no teigi ga aru.
Mixed hiragana and katakana (looks much prettier):
shikashi, amerika ni wa ika no futatsu no fīto no teigi ga aru.
Combined original and kana lines with furigana in [] to indicate that they were added by a Wiktionarian and are not present in the original (I think this looks best):
しかし、アメリカには以下 ([いか])2 ([ふた])のフィートの定義 ([ていぎ])がある。
shikashi, amerika ni wa ika no futatsu no fīto no teigi ga aru.
In any case, the ja-furigana template preserves the most information, and can later be automatically changed to any desired format by a bot if merely changing the template is not enough, so I'll use it when necessary. The examples above are just suggestions. By the way, WT:AJA itself uses mixed kana in examples, but says that hiragana is to be used, and there are other places where it talks about hiragana and katakana where I'm not sure if that's what's really meant, so there's certainly some room for clarification in it. -- Coffee2theorems 12:02, 17 May 2007 (UTC)[reply]
Note: I renamed the template ja-furigana to JAruby on Robert Ullmann's suggestion. -- Coffee2theorems 14:32, 17 May 2007 (UTC)[reply]

I experimented with adding rubies not in the original text at 雨降って地固まる by adding an extra parameter to the template, "myruby", and making any rubies with that gray so as to be distinguishable from original rubies. Using the <font> tag is not particularly pretty, but as it's in one place (the template), it can easily be changed. Adding characters to the ruby doesn't look nice on current browsers, and neither does italics, so I went with color. Bold would probably be rendered fine, but it's supposed to be the opposite - is there no "thin" tag? :-) Anyway, opinions on the concept? As you can see, just the original text and romaji take a lot of space for that quotation (and it's one sentence, albeit a long one), and a translation needs to be added too. I think rubies are better than adding kana separately like WT:AJA says. -- Coffee2theorems 18:46, 19 May 2007 (UTC)[reply]

I like the idea, and certainly if rubies are found in texts we quote, we must have the ability to display them. Your work on 雨降って地固まる looks good. Two suggestions: for a 'thin' tag, you are aware of <small>, correct? Second: perhaps the colour should be a bit darker (and perhaps <small> should not be used after all), because the rubies are rather hard to see. — Beobach972 19:47, 19 May 2007 (UTC)[reply]
I'm aware of the <small> tag, but it doesn't fit the purpose for the reason you gave and also because a slight difference in size wouldn't easy to notice unless you can compare. See お気に入り for a comparison, in there the rubies are from the original text and are not grayed. By "thin" I meant something like reducing stroke width, making the color closer to white, or similar. You could do that manually by assuming that people have the default colors for links and text, and specifying colors closer to white for them in rubies. Or is there a better way? -- Coffee2theorems 22:41, 19 May 2007 (UTC)[reply]
Ah, true. I assume most browsers do display Wiktionary text as the standard black (blue for links) on the white background, so the idea of specifying different colours is certainly one solution. Alternatively, we could leave them regular-size above the quotations, and assume that anyone who could read Japanese script would know about rubies — however, I can see that becoming very confusing if we ever used a quotation that provided rubies above each word. — Beobach972 23:23, 19 May 2007 (UTC)[reply]
The ruby looks good, although a darker gray might be better. On a side note, I think it would be more effective to use a quote that demostrates the use of the term, rather than an explanation of the term in Japanese. Also, a bilingual attribution line for quotes makes it easier for the reader. Here is a Mandarin example (inspired by one of your recent contributions): 井底之蛙. -- A-cai 23:43, 19 May 2007 (UTC)[reply]
I made it a bit darker and made it use colors for links. It recognizes broken links and existing links, but not visited links, as it's implemented with {{#ifexist:}} and <font color>. Proper implementation would be using CSS, but I don't know how to do that. Good point about the quote, I'll see later if I can find a better one. Google Books doesn't have many Japanese books, and it looks like all the reasonable ones are from the same publisher.. I agree with the attribution too, but I'm not confident I can read the names correctly. It seems like anything goes with Japanese name readings. -- Coffee2theorems 14:45, 20 May 2007 (UTC)[reply]
There's no difficulty with rubies that are from the original text, they can be presented like this:  (かわず)大海知らず. What I'm suggesting that we could use rubies also for kana that is not from the original text. Let's say the example above is the original text. Then WT:AJA says it should be presented like this:
i no naka no kawazu taikai o shirazu
a frog in a well does not know the great ocean
I'm suggesting we could present it like this instead:
 () (なか) (かわず)大海 (たいかい) ()らず
i no naka no kawazu taikai o shirazu
a frog in a well does not know the great ocean
This is easier to read than a separate line of kana and takes less space on the screen. The only possible problem I see in it is making a distinction between rubies that were in the original text and ones that were added by a Wiktionarian. In the page source the distinction is clear, so it may be that it is not even necessary to show it to the reader (they can check the source if they're really interested), but if it can unobtrusively be shown then I think it'd be nice to do it. -- Coffee2theorems 14:45, 20 May 2007 (UTC)[reply]
One way to distinguish original-rubies from added-rubies might be using background color. Another might be to use parentheses (U.K. brackets) to mark original-rubies, and square brackets to mark added-rubies. —RuakhTALK 16:03, 20 May 2007 (UTC)[reply]
I tried square brackets first, but that makes the rubies longer, and the longer they are the worse they're rendered. I'm sure it will improve in the future, in which case we can use the brackets for added rubies. Background color might be an option, but it has to be very light, or it will look horrible :-) I thought of font faces too, but I don't think there are significantly different ones for hiragana that are commonly installed on computers. -- Coffee2theorems 17:02, 20 May 2007 (UTC)[reply]
Here is a good quote for your entry (from Google books). An example of a relationship growing stronger after adversity is that between Japan and the US after WWII (which is also a topic that a typical westerner should be familiar with in approaching the subject of Japan and Japanese). The quote is talking about this history, and then uses the phrase as part of one of the sentences. In this way, you see the phrase in a context that gives one an impression of how it would be used, but also adds a bit of historical or cultural information as well (I like to throw that in there as well, when possible). -- A-cai 07:04, 29 May 2007 (UTC)[reply]
I used that, it looks quite nice. Didn't translate it though.
I also removed ruby links from the JAruby template, as they're not necessary (they never link to the main entry anyway) and such unnecessary links are avoided in translation sections too. This leaves the ruby color for indicating solely the source of the ruby (original text or Wiktionary editor). The color for the former is of course black, and for the latter I set it to an ugly yellow for now. Feel free to modify the color in the template if you find a better one, to me the differentiation is more important than how it looks :-) -- Coffee2theorems 13:03, 29 May 2007 (UTC)[reply]

Since Template:nav has just about reached maximum reasonable capacity, does anyone else think it would be a good idea to break it up into several more specialized lists? Say, things like Template:nav-europe, Template:nav-major, or Template:nav-ancient? --Ptcamn 09:51, 13 May 2007 (UTC)[reply]

I really would like to see the problems it has caused with Special:Wantedpages fixed first, before making the problem worse. After that has been taken care of, it would be reasonable to reopen this topic. --Connel MacKenzie 10:04, 13 May 2007 (UTC)[reply]
Splitting them up would alleviate that to a certain extent, wouldn't it? --Ptcamn 14:17, 13 May 2007 (UTC)[reply]

I think a sensible approach might be to limit which langauges are included in the nav template. Currently, some rather obscure languages are listed, and these languages have very few entries and very few categories. By contrast, Japanese has a huge number of entries and numerous categories. Why not use three criteria for including a language in the nav template: (1) All Top 40 languages, (2) minimum number of entries, (3) minimum number of populated categories. Or we could create a list of languages to be included, wrangle over them, and then require discussion before any further languages are added at a later date.--EncycloPetey 14:56, 13 May 2007 (UTC)[reply]

EncycloPetey, that would be in the opposite direction we (I think) want to go. We (the community, not me) actually do want those obscure languages represented completely here. The idea is to give new contributors indirect assistance by letting them know what categories are desired (in general.) --Connel MacKenzie 16:55, 13 May 2007 (UTC)[reply]
Ptcamn, splitting them up would make them simply harder to find, than they are already now. Trying to determine where the "bug" is in these templates would be considerably worse, if you had to first guess which of three (or more) template families to search. --Connel MacKenzie 16:55, 13 May 2007 (UTC)[reply]

Another problem I just noticed with the nav template. See Category:Birds. At least on my computer, the right-screen box templates above the nav box affect the way text is displayed. It shouldn't. --EncycloPetey 17:05, 13 May 2007 (UTC)[reply]

Actually, I may have been on the completely wrong track with Special:Wantedpages. book seems to link incorrectly to Category:xx:Slang somehow, but looking at template:context/categorize it is not apparent how the {{slang}} is causing that. --Connel MacKenzie 17:43, 13 May 2007 (UTC)[reply]
I don't know how much everybody else has already figured out, but: the Category:xx:Slang in book is due to the {{context}} call -- {{context|transitive|slang}} -- not the plain {{slang}} call. The "xx" bit appears to be due to {{context/checklabel}} using lang=xX and lang=Xx; this in combination with an apparent software change that treats #ifexist the same as a link results in all these things showing up in Wantedpages. {{nav}} just uses ifexist a lot.
The Template:xX, Language:XX, and such are directly due to {{language}}, which gets called via {{idiomatic}} and {{borrowed}}. Cynewulf 15:30, 14 May 2007 (UTC)[reply]
Thank you! I would've saved myself a fair amount of aggravation had I read your (more complete) post, before following down the same path (but getting lost.)
I do recall seeing something in wikitech-l about the ParserFunctions more completely mapping references. (February? March?) If we can't use #ifexist:, what can we use instead? #if:? OR do we have enough to file a coherent bugzilla: now? --Connel MacKenzie 20:04, 14 May 2007 (UTC)[reply]
Ah.. I thought you had moved the discussion here. I noticed that {{borrowed}} is only used in a few articles, and I've been using my sandbox to test things. The template "language" is rather different from my normal C... Anyway, see Special:Whatlinkshere/quux: I put an ifexist:quux in my sandbox and it shows up as a link there. Cynewulf 20:19, 14 May 2007 (UTC)[reply]
Well, the change is here, but it doesn't seem like the sort of thing dev's will want to fix anytime soon. (If the possible branches aren't identified in the link table, they can't correctly follow links that need reparsing by the jobqueue, when some subsidiary template changes.) --Connel MacKenzie 20:31, 14 May 2007 (UTC)[reply]
That is, it'll be a cold day in hell, before they decide to duplicate that table (minus the #ifexist stuff.) I guess our best option is going to be entering redirects (such as Category:xx:Slang.) That'll be a lot of redirects. --Connel MacKenzie 20:53, 14 May 2007 (UTC)[reply]


First, break up nav into {{nav-top}}, {{nav-mid}}, {{nav-bottom}}, with the appropriate parameters (some are used at top, some at the bottom).

Then, for a given topic, say Horses, create template {{topic-Horses}} that takes language code and name as parameters.

This template calls nav-top, then lists the languages that have Horses topic categories, then nav-bottom with the correct parent category(ies).

Each xyx:Horses category page consists of {{topic-Horses|xyz|Xyloz}}. To add a new Horses language topic category, one creates the Category:foo:Horses page, consisting of that template call, and then edits the topic template to add language "Foo".

The big win is that restructuring the topic tree, for example moving Horses from (say) Animals to Mammals requires only editing the {{topic-Horses}} template, not editing all of the xyx:Horses cats.

Scales to any number of topics and languages, can be implemented incrementally, and no more links in Special:Wantedpages. Robert Ullmann 12:51, 23 May 2007 (UTC)[reply]

Let me just say that, while I have no idea how most of what was just proposed works, I am in favour of any measure which would remove all that crap from Special:Wantedpages. And I think being able to scale the topics to a given language is a good idea as well. I don't believe we'll ever want or need Category:grc:Microbiology or Category:grc:Neuroscience. Atelaes 13:56, 23 May 2007 (UTC)[reply]
If we're wanting to go that route, we can also use a somewhat similar approach to avoid flooding Special:Wantedpages without changing the appearance of the nav-tables, by giving letting {{nav}} take a named parameter for each language that should be included, and only linking to ones that it's told to: {{#if:{{{lang-grc|}}}|*[[:Category:{{#if:{{{lang-grc|}}}|grc:|}}{{{current}}}|Ancient Greek]]|* Ancient Greek}} instead of {{#ifexist:Category:grc:{{{current}}}|*[[:Category:grc:{{{current}}}|Ancient Greek]]|* Ancient Greek}}. (For that matter, I'm not even sure it needs to be that convoluted; I wasn't sure if it was the use of #ifexist:Category:grc:foo that added Category:grc:foo to Special:Wantedpages, or if it was the linking to Category:grc:foo in the unincorporated branch of the conditional, so I took the safe approach and assumed that either one alone would have this effect, so eliminated both.) —RuakhTALK 18:03, 23 May 2007 (UTC)[reply]

Should the subcategories in other languages take Category:xx:Proverbs or Category:Gibberish proverbs].? --Keene 07:55, 14 May 2007 (UTC)[reply]

That depends on an unresolved issue. If we allow ===Proverb=== as a "part of speech" header, then we should use Category:Gibberish proverbs. If we disallow the use of ===Proverb=== as a header, then we should use Category:xx:Proverbs. This could be resolved with a discussion and vote, if necessary. --EncycloPetey 17:38, 14 May 2007 (UTC)[reply]
Proverbs are specific to each language. Russian proverbs are actually Russian proverbs, therefore the category Category:Russian proverbs. ru:Proverbs would only be needed if "Russian proverbs" were an illogical term. —Stephen 16:57, 17 May 2007 (UTC)[reply]

Necessary tidying up of Wikisaurus templates.

Ages ago, when someone decided we should change WikiSaurus to Wikisaurus, Vildricianus did a bit of changing, but didn't really follow through with it properly, leaving something of a mess. I'm about to try to tidy up the mess, so please bear with me.--Richardb 12:42, 14 May 2007 (UTC)[reply]

See User talk:Vildricianus == change of Template:WikiSaurus-link to Wikisaurus ==
When you felt it necessary to make the change from template WikSaurus-link to Wikisaurus, why did you not also do the necessary clean up of removing all references to WikiSaurus-link.
It's counter-productive to make changes but only go half-way and leave a completely ambiguous mess behind.
what was so wrong with WikiSaurus-link that the mess that is there now is better ?--Richardb 05:00, 13 May 2007 (UTC)[reply]
Actually, I've discovered an even bigger stuff up you created by what was frankly unnecessary meddling. We now have a template:WikiSaurus and a template:wikisaurus (spot the difference ?) which correspond to two different templates that I created, which were Template:wikisaurus-header, and Template:wikisaurus-link. Those names had clear meanings. Why you renamed them to two very confusingly similarly named templates, which even I can't remember which is which, is beyond me. So I guess I'll do the necessary tidying up to put it all back to being understandable. Thanks!

As I recall, it was a community vote to add the namespace. Many templates didn't work shortly after the new namespaces were added. But if you are willing to go through all 100-or-so WS entries to verify them, then thank you. --Connel MacKenzie 14:42, 14 May 2007 (UTC)[reply]

Hey, I accepted the rename from WikiSaurus to Wikisaurus. Just that I anticpated that no-one would really do all the necessary work to implement it. So now I'm doing the cleanup. Part way through now.
Actually, I missed out on the whole discussion around why we now a separate namespace. I'd like to catch up on that, and perhaps put a neat summary somewhere easy, like maybe Wiktionary:Wikisaurus/reason for a namespace.--Richardb 16:54, 14 May 2007 (UTC)[reply]
Actually, you participated in that debate, in quite a lively manner: Wiktionary:Beer parlour archive/May 06#Pushing for the definitive WikiSaurus name and namespace. --Connel MacKenzie 21:07, 14 May 2007 (UTC)[reply]
No need to go through all the entries unless you really want to. You can redirect the templates and they will still work. Just don't leave double-redirects. DAVilla 17:02, 14 May 2007 (UTC)[reply]
Why not just let DblRedirBot clear them in a few days? --Connel MacKenzie 21:09, 14 May 2007 (UTC)[reply]
Okay, but a double-redirected template won't work until the bot gets around to fixing it, which makes it look like you have to manually edit all the pages, when in fact it's easier to just make a single modification. DAVilla 22:18, 14 May 2007 (UTC)[reply]
I'm willing to go through them, over a few days, because the two template names were so confusingly similar that the wrong one may have been used. Plus, there is often a little bit of other tidying up to be done along the way. Anyway, I'm totally depressed at teh moment and can find nothing better to do, which is why I'm spending time on Wiktionary anyway! Ricahrdb (forgot to sign in!) — This unsigned comment was added by (talk).

re: Wiktionary:Votes/2006-09/Wikisaurus semi-protection

Question: Is there a definition somewhere of what "semi-protection" actually means ?--Richardb 16:50, 14 May 2007 (UTC)[reply]

Currently the only middle option is to block unregistered users. DAVilla 16:59, 14 May 2007 (UTC)[reply]
We never really followed through on taking any action. But I have a suggestion (see below) that might help the situation which prompted the vote. --EncycloPetey 17:53, 14 May 2007 (UTC)[reply]

Wikisaurus summer challenge

I'd like to challenge each active sysop and any other interested users to each create three quality [[:Category:Wikisaurus|Wikisaurus]] entries by 15 June. If we did that, it would help to swamp out all the crap that has accumulated in Wikisaurus and make the whole thing look much more respectable. --EncycloPetey 17:46, 14 May 2007 (UTC)[reply]

I'll try my hand at Wikisaurus:rich, Wikisaurus:poor, and Wikisaurus:free. I've never participated in a thesaurus project before, but if my years of crossword-doing are good for anything, it's this. ;-)   —RuakhTALK 15:56, 15 May 2007 (UTC)[reply]

Japanese and Han character indexes

The main Japanese index is Index:Japanese. I think it should be replaced with Index:Japanese Kana (compare Index:English), which actually tries to fit the description "This is the Japanese index. It is a list that should contain all words of this language, sorted alphabetically (or by a similar ordering system)." The current index certainly doesn't. The topical list of characters on that page and Index:Chinese topical are extremely incomplete, arbitrary and nobody's been editing them for years (besides, we have categories like Category:Animals for the topics). They're in no way an "alphabetic or a similar ordering system" either. I added the other links from Index:Japanese to Index:Japanese Kana as see also links. Except for the topical stuff there's now nothing in the former that's not in the latter.

Also is there some reason why there are almost no words in the kana index? Is the whole concept of indexing words (as opposed to characters) deprecated and should the pages just be deleted? Even the English index is not particularly actively edited, so maybe the pages are there for the sole purpose of confusing me? :-)

I think the Han character indexes would be better renamed. What and how do you suppose Index:Japanese Kanji indexes? It indexes Han characters by their on and kun readings. Index:Chinese Pinyin is a bit better name as it's at least not called "Index:Chinese Hanzi", but it doesn't index Chinese words - or are those long Compounds sections in entries such as supposed to be indexes (ouch)? There's also Index:Chinese total strokes, but Index:Japanese total strokes is missing (stroke counts differ sometimes). No wonder it is so - Index:Chinese total strokes also lists the characters by their Japanese stroke counts! Why is it then called Chinese total strokes? Mysterious. Yeah, the "Chinese" there means "Chinese character", but even if you guess that you can't be sure before you check how some character such as 浅 where the stroke counts differ is handled.

How about renaming these e.g. like this:

If you look at any Han character entry, there's a subsection in the translingual section called "Han character", so using the same for the indexes would make sense. -- Coffee2theorems 00:32, 18 May 2007 (UTC)[reply]

It seems like indices should be bot-generated, and any information needed in indexing an entry should be available in the entry in bot-readable form. (The bot that handles these could then add {{rf-stroke-count}} or whatnot to entries that don't have that information in a form it can read.) As for your format proposals — those all make sense to me, though I'm certainly no expert on any of CJKV. —RuakhTALK 03:33, 18 May 2007 (UTC)[reply]
All of the Chinese/Han character indexes were generated by NanshuBot, as of the information it had; i.e. with lots of errors since corrected. I moved them all from the Wiktionary: namespace to the Index: namespace, repairing all the links, but not trying to do anything about regenerating the content or thinking about the naming. All of the Han character entries should have all of the information available in the {{Han char}} template (and several others); the remaining exceptions can be found in User:Robert Ullmann/Han/Problems. The topical indexes should probably just go to RFD/O, as you say, we have topic cats now that are better. Robert Ullmann 14:10, 19 May 2007 (UTC)[reply]

Wikisaurus changes

I strenuously object to Richardb's "30,000" arbitrary change to the Wikisaurus criteria. That is highly irrational, as the results from google (in particular) can vary by more than that from one day to the next.

The relevant section must simply say that terms not meeting Wiktionary:Criteria for inclusion belong on the "/more" subpages, without the Wiki-syntaxt linking the terms ("wikification.")

--Connel MacKenzie 14:31, 15 May 2007 (UTC)[reply]

I wholly agree. Numbers of Google hits are not a useful measure of whether an entry should be in Wiktionary or not, and neither should they be for WikiSaurus, IMO. — Paul G 14:50, 15 May 2007 (UTC)[reply]
Why are the criteria for inclusion to be so enthusiastically enforced for Wikisaurus, but so often ignored for the rest of Wiktionary. I refer back to that version of bean that was from a lngauge only used by 2,500 mostly illterate Mexicans. Showw me how that meets the CFI. and there are thousands of esoteric examples that would not stand the test of CFI, but are let go. The problems only seem to be when the bowdlerisers get all steamed up. --Richardb 15:42, 15 May 2007 (UTC)[reply]
The CFI talks about "refereed academic journals" which pretty certainly does cover that adequately. You'd do well to spell-check your posts before calling someone "illterate" [sic].  :-)  NB: I share your complaint about esoteric diacritics, but sadly, we don't have a policy-type method of addressing the issue. --Connel MacKenzie 15:53, 15 May 2007 (UTC)[reply]
OK, if you don't like the current wording or whatever of {{Wikisaurus-more}}, then change it to soemthing better. I think I did say "just go for it". (God, did I really just make that "invitation" to CM ? Ahh!  :-) --Richardb 15:42, 15 May 2007 (UTC)[reply]
Will do. --Connel MacKenzie 15:53, 15 May 2007 (UTC)[reply]
Waitasec - why is it incorrectly at {{Wikisaurus-more}} instead of {{wikisaurus-more}}? Was that intentional, or a minor error? --Connel MacKenzie 16:01, 15 May 2007 (UTC)[reply]
That was intentional. The name is Wikisaurus, and all the templates begin with Wikisaurus. Now, please, don't you go changing them all on me again!--Richardb 16:15, 15 May 2007 (UTC)[reply]
I've just checked, and there are hunderds of templates that begin with a Capital letter, so don't go getting all ancy antsy on me now. Please, pretty please.--Richardb 16:19, 15 May 2007 (UTC)[reply]
Well that is a separate topic, but I assure you: the hundreds of upper-case templates are mostly errors (in comparison to the 5,000+ correctly named/capitalized templates.) --Connel MacKenzie 16:34, 15 May 2007 (UTC)[reply]
If (as you suggest) someone's first exposure to Wiktionary is via Wikisaurus, it would be a tremendous disservice to them, to try convincing them to name/capitalize their templates incorrectly. --Connel MacKenzie 16:36, 15 May 2007 (UTC)[reply]
I don't have a problem with the capital. All the categories are capitalized, the Wiktionary space is capitalized, and Wikisaurus is a proper name. If anything we should be more concerned with reserving a shorter sub-namespace, such as Template:WS- or Template:WS: or something. DAVilla 17:18, 15 May 2007 (UTC)[reply]
Yes, it's much more elegant to have just one set of criteria, and only half as complex to maintain if all links are blue. However, I have no problem with putting the "more" section at the bottom of each page, via transclusion or whatever. You have to understand that half of the visitors to a page are not going to scroll down anyways (which is why I really dislike the TOC). We should discuss this with the intention of taking the proposals to vote. The extra section is not meant to be a list of protologisms, which is what putting it off to another page that "decent" people don't visit turns it into, "decent" in the sense of Richard B. Decent, not Sister Eucharista branded seal-of-decency. It's unfortunate that euphemisms apply so much more often to the taboo. DAVilla 17:50, 15 May 2007 (UTC)[reply]


Are we still using Ethnologue as a reference for language codes and the like? I bring this up because I've just made a verb table for a Languedocien verb (anar), which is not on our language list. I put it under the heading Occitan as I have seen that term used around the site. Ethnologue, however, does not give "Occitan" as a language but, appropriately, divides "oc" languages into six sub-groups. While many words are common to all six, there are sufficient differences such that they ought not be lumped together. In summation, my questions are as follows. Is there any opposition to:

  1. my relabelling of all entries headed with "Occitan" into their appropriate languages as defined by Ethnologue?
  2. adding these six languages to the list of languages and incorporating the use of their language codes?

Thanks! Medellia 22:21, 15 May 2007 (UTC)[reply]

Ethnologue has one classification, but there is dispute as to exactly how to divide ‘Occitan's dialects’ or ‘the languages of the Occitan family’ (whichever term you prefer), and other methods have been proposed... that could cause problems if was decide to be specific. — Beobach972 23:45, 15 May 2007 (UTC)[reply]
A suggestion, in the event we decide to retain and not divide the Occitan header: you could tag all Occitan words with their dialects of usage. (To avoid having to specify every dialect on a universal word, you might say (all dialects), and (all dialects but Lemosin), etc.) — Beobach972 23:45, 15 May 2007 (UTC)[reply]

Ah, I didn't realize that. Thanks for your suggestion. I guess my biggest concern is verbal morphology: using one template for all dialects would be nothing short of a nightmare... but given that many if not most lemma forms are found in all dialects, I could see those pages growing rather large indeed. Medellia 01:21, 16 May 2007 (UTC)[reply]

The same problem occurs in other "languages". There are four mutually unintelligible "languages" lumped under the name Albanian, for instance. Only two of those languages are spoken in Albania. However, if we separate all the different "languages", then how do we reconcile this with the fact that there is an "Albanian" Wiktionary? My best suggestion is to inquire at the Occitan Wiktionary as to how they deal with the issue. --EncycloPetey 02:36, 16 May 2007 (UTC)[reply]
Also, if you need to create different templates for the different varieties, feel free ... I see no reason why anyone would object, if it's impossible to use the same template. — Beobach972 03:26, 16 May 2007 (UTC)[reply]

If there's no objection to multiple templates, I'll just do that. Sadly, neither the French nor Occitan Wiktionaries have made much headway as far as Occitan verbs and the morphology thereof are concerned. I asked (attempted to ask may be closer to the truth!) about dialects at Wikiccionari. (It seems as if Wikiccionari has been abandonned, so that may not pan out in the long run.) For the purpose of comparison, is their much handling of English dialects that I might be able to examine? Thanks again. Medellia 03:52, 16 May 2007 (UTC)[reply]

There's a Category:Australian English, Category:Canadian English, Category:Irish English, Category:UK, Category:US, and probably others that you could look at. --EncycloPetey 03:56, 16 May 2007 (UTC)[reply]

Excellent. Thanks much! Medellia 04:00, 16 May 2007 (UTC)[reply]

I don't know... if verb morphology is substantially different, I think the difference between dialects/languages may be too great to compare to Australian vs Canadian English. Particularly if, as EncycloPetey notes about Albanian, the varieties are mutually unintelligible. Are they? — Beobach972 04:55, 16 May 2007 (UTC)[reply]
May I ask if the meanings remain more or less consistent across the various dialects? If it's largely a matter of morphology, you could use collapse-able inflection templates, and have a bunch of them stacked under the Inflection header, with the dialect names as the visible component when collapsed. Or perhaps I'm completely misinterpreting the problem here. Atelaes 05:04, 16 May 2007 (UTC)[reply]
That's a good suggestion for entries where the same lemma form is used in many dialects/languages, and conjugated differently in each. Can conjugation templates be placed inside collapsable templates, though, without all the wikicode clashing? — Beobach972 05:15, 16 May 2007 (UTC)[reply]
That's what I was hoping to do. With any luck, it will work out okay. I did make a collapsible Greek template a while ago, so in theory it should work. Medellia 05:28, 16 May 2007 (UTC)[reply]
Medellia, you can take a look at the German wiktionary to see the approach there. These are the Occitan verbs we have there : de:abaishar, de:aver, de:baissar, de:batre, de:bonhar, de:burar, de:cagar, de:cardar, de:declamar, de:embotir, de:florir, de:parlar, de:programar, de:sentir, and de:èsser. Of those, de:abaishar and de:baissar are probably the most interesting. — Beobach972 05:15, 16 May 2007 (UTC)[reply]

Thank you for the links! I didn't even think to check the German site. Medellia 05:28, 16 May 2007 (UTC)[reply]

I should point out that ISO 639-3 previously distinguished Languedocien (and others) following Ethnologue (which formed the initial basis for ISO 639-3), but just a month ago it was merged into Occitan, and its code was retired.
Re: intelligibility, I don't think mutual unintelligibility is a good criterion for us to use in distinguishing languages, since we're primarily concerned with written rather than spoken language. The colloquial varieties of a given 'language' may well be mutually unintelligible, but in many cases they all use the same standard written language. --Ptcamn 20:22, 16 May 2007 (UTC)[reply]
True, but if a set of 'varieties' do not use the same written language? At that point, is mutual unintelligibility a fair enough criterion? (I am not saying that the varieties of Occitan are mutually unintelligible, of course, keep that in mind — I don't know if they are or not.) — Beobach972 23:09, 16 May 2007 (UTC)[reply]
I found the following on w:oc:Occitan, showing a text some of the different varieties :
Vivaroal. (1)- Totas las personas     naisson liuras e egalas en dignitat e en drech.     Son dotaas
Lemosin (2)-   Totas las personas     naisson liuras e egalas en dignitat e en drech.     Son dotadas
Gascon (3)-    Totas las personas que naishen liuras e egaus  en dignitat e en dreit. Que son dotadas 
Lengadoc. (4)- Totas las personas     naisson liuras e egalas en dignitat e en drech.     Son dotadas 
Provençau (5)- Totei lei personas     naisson liuras e egalas en dignitat e en drech.     Son dotadas

(which continues...)

(1) de rason e de consciéncia e lor    chal        agir entre elas amb un esperit de frairesa.
(2) de rason e de consciéncia e lor    chau (/fau) agir entre elas emb un esperit de frairesa.
(3) de rason e de consciéncia e que'us cau (/fau)  agir entre eras dab un esperit de hrairessa.
(4) de rason e de consciéncia e lor    cal         agir entre elas amb un esperit de frairesa.
(5) de rason e de consciéncia e li     cau (/fau)  agir entre elei amb un esperit de frairesa. 
I am, as I said, not certain whether those are mutually unintelligible or not, but it certainly shows a difference. (& of course, we do not know the POV; that text could have been selected by oc:WP to minimise differences or magnify them. — Beobach972 23:09, 16 May 2007 (UTC)[reply]
I see that we differentiate Min Nan from Mandarin by writing the former with the Latin alphabet and the latter with Chinese characters... so that doesn't help this situation... (I was hoping to compare headers or such). — Beobach972 02:45, 17 May 2007 (UTC)[reply]
It's the first article of the Universal Declaration of Human Rights, which, as a well-known text with standard translations into a wide array of languages and dialects, is a pretty common choice for such comparisons. (The Lord's Prayer is perhaps a slightly more common choice, but it has the downside of tending to use archaic language.) So, POV is probably irrelevant (except insofar as the translators might have maximized differences in order to justify their paychecks), though if different translators produced the different translations (I don't know), some differences may result from translators' idiosyncrasies rather than from actual dialectic differences. —RuakhTALK 04:20, 17 May 2007 (UTC)[reply]
Oh, yes, it's the declaration; I was just saying that perhaps the contributors to the Occitan Wikipedia selected it because it uses many words that are similar in all of the varieties, thus supporting their POV that Occitan's varieties are similar, whereas they chose not to include the translation of the phrase the cat is red because it is nega garo in one variety and frach uoge in another (note that I just made those words up), which would demonstrate the opposite... or perhaps the situation is reversed. I was, and am, only providing hypothetical situations, of course. — Beobach972 04:50, 17 May 2007 (UTC)[reply]
Cat, so far as I can tell, is gat or gata, red is roge or roja. Anybody care to create those entries, by the way? — Beobach972 05:00, 17 May 2007 (UTC)[reply]
I know cat happens to change depending on the dialect as it is, in fact simply cat in one of 'em. I'll give it a look and then try to get to some entries! Medellia 05:43, 21 May 2007 (UTC)[reply]

Actually, Min Nan can be written in Chinese characters as well. Min Nan wikipedia (and Min Nan wiktionary) decided to use the Latin alphabet because there are a great many Min Nan words that make use of Chinese characters that are not used in Mandarin (and are therefore often absent in conventional Chinese fonts). However, I do include them here on English wiktionary (to the extent possible, sometimes I have had to be creative. See translation section of ticklish), because I think it helps a student of the language to know which Chinese characters (if any) are used. See 翻译 (translate) for an example of an entry with both Mandarin and Min Nan (two separate L2 headers). -- A-cai 11:10, 17 May 2007 (UTC)[reply]

Usually, variances in dialect aren't as much of a written problem as a spoken language problem. The morphological differences warrant consideration, but they are reasonably mutually intelligible. For the purposes of completion, however, I will most likely find a way to present any given word in as many dialects as I am able. For Ancient Greek, we've been putting the dialect name in Template:italbrac, so it seems reasonable to do likewise with Occitan. Thanks to everyone for all of the input! Medellia 05:43, 21 May 2007 (UTC)[reply]

subdividing definitions

Can we bring this up again? I'm not quite sure how people feel about it. It's my opinion that subdivision is very useful for words which have a very large number of (often quite different) defintions. However, recently Connel has (silently) taken them out of the few articles which used them. Fair enough, it's not in ELE. But I maintain that the earlier versions are far clearer. Compare the earlier version of ‘body’ with the current version; or mark then compared to mark now. Is there any place for this in Wiktionary? Certainly many print dictionaries find the device necessary. Any thoughts? Widsith 12:19, 17 May 2007 (UTC)[reply]

Considering the fact that the "subsense" experiements were originally conducted "silently", then when discovered met significant technical objections (i.e. it is unacceptable to use "##") I find it pretty cheeky of you to suggest that I am "silently" removing them. One or two "experimental entries for discussion" this is not. Nevertheless, serious discussion on the technical limitations prohibiting "##" is welcome. --Connel MacKenzie 15:16, 17 May 2007 (UTC)[reply]
It is my experience that most entries with "##" subdivisions of definitions are copyright violations from one of the major dictionaries that uses this format. --EncycloPetey 15:20, 17 May 2007 (UTC)[reply]
Oh, really? Well that's kind of a separate discussion. The point is whether people think we might sometimes want them or not. Widsith 17:19, 17 May 2007 (UTC)[reply]
I would prefer to see something like this:
1. Definition:
2. Sub-definition.
3. Sub-definition.
4. Definition:
5. Sub-definition.
6. Sub-definition.
7. Another definition, especially:
8. Something more specific.
But that's hardly feasible right now, and really kind of out there, isn't it? What about:
    Definition class
    First def.
  2. Second def.
    Definition class
  3. Third definition.
  4. Fourth definition.
  5. Another definition, especially something more specific.
If only there were a more elegant way to do it. DAVilla 18:33, 17 May 2007 (UTC)[reply]
I think that possible technical work-arounds should be held off, until there is some clear consensus that allowing "subsenses" is desirable. (For the record, I oppose the concept; either it is a definition, or it is not.) --Connel MacKenzie 20:16, 17 May 2007 (UTC)[reply]

Duplication in etymologies

The subject of duplication in Wiktionary has been somewhat of a thorny issue in the past, so here goes as I raise it again.

Is it appropriate that we should duplicate information in etymologies? For example, the (Australian and New Zealand) English "sav" is derived from "saveloy", which in turn is conjectured to be from the French "cervelas", which does not yet have an entry in this wiktionary (nor an etymology in the French wiktionary, for that matter, but my thinking is that it comes from French "cervelle", meaning "an animal's brain used as food").

The etymology for "sav" said "Shortening of saveloy, supposedly from French cervelas" and the etymology for "saveloy" said "Supposedly from French cervelas". Given that we are not a print dictionary, is it helpful to duplicate this information? The etymology of "saveloy" is already given for that word, so does it need to be included in "sav" as well?

When it comes to tracing words all the way back to Sanskrit or Indo-European, there might be a case for giving all information, but all too often an etymology for a word "A" says "From B, from C" while the etymology for "B" says "From D" rather than "From C". Having a chain of links would avoid such potential inconsistencies.

Discuss... — Paul G 09:26, 18 May 2007 (UTC)[reply]

This is a bigger issue than that, but it is one that's been on my mind for some time. Thank you for raising it. I have two decided opinions, and one undecided opinion on this issue. The decided opinions: (1) We should always give the immediate ancestor of a word in the etymology. (2) Whenever possible, we should trace the etymology back to the first word outside the language of the entry (and I mean outside any historical ancestor of the language as well, so if an English word can be taken back into Old French, we should do so, even if it entered via Old English which is technically considered here a different language than English).
My undecided opinion is that we should take the etymology back only this far and no further, leaving an etymological trail on the entry for the other source word. My reasoning is related to the duplication issue mentioned above. Rather than try to put all the information in every location, we should just put the information that's needed for that entry and leave a trail to the other information. Someone may have a good counter-argument for this approach, which is why I say this opinion is undecided in my mind. Either way, I hope we can nail down some formatting and content policies for the Etymology section as a result of this discussion. --EncycloPetey 15:38, 18 May 2007 (UTC)[reply]

I think the issue of duplication is rather a moot point at this point in time. From the vast majority of the etymologies that I've done, Wiktionary contains absolutely no intermediate forms at all. So, perhaps that should be considered for the immediate future. We simply do not have older languages covered very well. Certainly Widsith has created a respectable repository of Old English, and our Latin section isn't too shrimpy, but past that......well we don't have much to speak of. Now, considering what we should do when Wiktionary becomes a bit more thouroughly covered in that area, I think that a decent etymology should contain two things: First, it should have a reasonably thorough coverage of the most recent history, perhaps the three most immediate ancestors. Second, it should go over some of the most important etymons. To give an example of what I mean by this, take a look at epidemic. Most of the words through ἐπιδήμιος mean roughly the same thing (I'm assuming), but we should certainly note that the word originally derives from ἐπί "upon" and δῆμος "people". I think the etymologies of the vast majority of English words are going to behave in a similar manner Middle English variant 1 < Middle English variant 2 < French variant 1 < French variant 2 < Latin possibly < Ancient Greek. It might be admittedly tedious to include all of the intervening forms, but when a word was originally composed of words whose meaning is now lost, we should include that. Also, keep in mind that, as Wiktionary becomes more comprehensive, if only show a word's most recent ancestors, a user would have to follow perhaps 5 or 6 links to get to the apical ancestor. I don't think we want that. Atelaes 05:37, 22 May 2007 (UTC)[reply]

Some good points; thank you. Maybe we can satisfy all concerns by somehow having nested etymologies. I don't have the technical savvy (pardon the pun), but these would look something like this, for a word "A":
From [B, from [C, from [D, from E]]]
The bracketed sections might be templates; I don't know. The idea is that everything would be defined once, and it would be sufficient to give "From [B]" as the etymology of A, and this would be expanded by the wiki software to give the full chain of etymons. This could also be made to work for branches (such as ἐπί and δῆμος in Atelaes's example). — Paul G 15:02, 22 May 2007 (UTC)[reply]
Hmmmm....that's an interesting thought. I guess I'd have to see that in action to really make a judgment on its effectiveness. Atelaes 04:50, 23 May 2007 (UTC)[reply]
Can it be honestly said that etymology paths don't fork? Don't older languages have homonyms? --Connel MacKenzie 05:43, 23 May 2007 (UTC)[reply]

Archiving this page for misplaced edits

I've just archived away a large chunk of this page. Until automated archiving is restored, I'll go back to the old-style month-by-month archiving.

PLEASE check your edits here, after saving. Section editing is broken today, with section edits somehow replacing prior sections with duplicated sections. No reponse from devs yet; I'm not certain they are aware (or believe) that it is a problem yet. --Connel MacKenzie 15:02, 19 May 2007 (UTC)[reply]

Context labels to templates

At Connel's suggestion I've been training AutoFormat to convert labels on definition lines like (''label'') to the context template. It adds the language code when needed, and removes duplicate cats. For example:

  • 片口鰯 edit fish to fish|lang=ja, keeps cat because of the sort key
  • reyv edit two labels, removes duplicate cat
  • motivo edit music to music|lang=it, entry now appears in it:Music

motivo is a good example of a knowledgeable user adding the Italian section, but missing out on the context label template, and thus the category, the sort of thing AF fixes. See the doc and talk page, tell me what you think; running for a bit now with me checking all the edits, can leave it running or vote on adding the feature to AF. Robert Ullmann 16:52, 19 May 2007 (UTC)[reply]

I've been running it this way for a while, it is adding an interesting number of words to (correct) categories. Any comments? Does everyone think it is a great idea or is everyone just watching the cricket? (rain delay ;-) Robert Ullmann 13:42, 21 May 2007 (UTC)[reply]

Vote on adding this to AutoFormat set at Wiktionary:Votes/2007-05/AutoFormat converting context labels to templates. Feature is tested, has been run on a number of entries, turned off for now. Robert Ullmann 07:06, 23 May 2007 (UTC)[reply]

Parts of speech in an entry - order of sequence?

Is there a policy or guideline specifying how to organize the entry for a word which comes under multiple parts of speech? I'm baffled by the problem of how to decide what the write-up sequence should be for, say, "home" which can be a noun, verb, adjective, and adverb. Specifically, I just got tripped up over the entry for White Russian; I ordered it one way and another editor flipped the parts of speech around. -- WikiPedant 16:42, 18 May 2007 (UTC)[reply]

Alphabetical order. Robert Ullmann 16:48, 18 May 2007 (UTC)[reply]

DISCUSSION MOVED TO BEER PARLOUR: I initially asked this question at the Information Desk, and received Robert Ullmann's response above, but am raising it again here because Mr. Ullmann's answer seems problematic to me. If parts of speech should simply be listed in alphabetic order, then adjectival and adverbial usages would always precede the nouns and verbs, which in a great many cases would mean defining the derivative, secondary, less common, and probably less important usages first (i.e., the usages in which the dictionary user is likely to be least interested). To continue with the example of "home," Wiktionary's own entry lists the parts of speech in this order: noun, verb, adj, adv. The OED's entries for "home" use this order: noun, adj, verb, adv. And the Random House order is: noun, adj, adv, verb. They all (rightly, I think) put the noun first and none of them uses the alphabetic order of: adj, adv, noun, verb. Has this been discussed previously? Does Wiktionary have any written policy or guideline about this? -- WikiPedant 03:47, 20 May 2007 (UTC)[reply]

I agree with you. When the order of the parts of speech particularly bothers me, what I'll sometimes do is abuse our split-by-etymology policy. That's probably not the desired solution, though. ;-) —RuakhTALK 04:47, 20 May 2007 (UTC)[reply]
When it became a decree several years ago, it most certainly was discussed at some length. (Note that it predates WT:ELE, WT:CFI, WT:VOTE as well as even the notion that Wiktionary should have policies at all.) Webster's 1913 (and later) listed the POS in alphabetic order, and no other order would not raise additional disputes. (Obviously, for bots, alphabetic is easier to understand than a disputed, changing, random order.) While there are reasons to use alphabetic order, there aren't compelling reasons to use other orders. Each reshuffling, done as a result, ends up with "better" independent, free-standing definitions. --Connel MacKenzie 07:28, 20 May 2007 (UTC)[reply]

For the record, I hate alphabetical order. I think it's much clearer to put the earliest part of speech first, so that if a Noun developed from a Verb then the verb would come first in the entry. Otherwise the Etymology section makes little sense – or you have to include cumbersome explanations there like saying ‘the noun is derived from the verb’. Widsith 09:00, 20 May 2007 (UTC)[reply]

And others disagree. I would like to see the more common usages first, but that too presents difficulties. Alphabetical order is a neutral middle ground in which there can't be argument about what is "correct". As Connel notes, it's also the easiest for a bot to handle. In truth, the fraction of words that exist as more than two parts of speech in English is surprisingly small. Such words are usually the common, short words that carry myriad definitions and complex etymologies anyway. The majority of English words exist as only one or two parts of speech, and it is easy in these situations to note in the etymology the order of development, if it is known. --EncycloPetey 20:32, 20 May 2007 (UTC)[reply]

Connel MacKenzie says above that alphabetic order was a "decree" early in Wiktionary's history. But is it written up anywhere? I've been all over the policy pages and, athough admittedly I tend to miss things in my old age, I just can't seem to find it. Can somebody please provide a link to a page? -- WikiPedant 20:22, 20 May 2007 (UTC)[reply]

You are quite right, it isn't in any policy page; it should certainly be in WT:ELE (just takes one sentence). The wikt was very bad for quite a while at writing down policy; something would be hashed out successfully in discussion, and from then on policy would take the form of biting off the heads of newbies while whining that this has been decided. (;-)
In the case where there is serious etymological information, and a resulting "logical" order (noun being derived from verb or whatever) then we have multiple ety sections, and they can be in the logical order. But in all simple cases, we put the POS in alpha order just to have a simple rule. Robert Ullmann 13:37, 21 May 2007 (UTC)[reply]

Well, this issue seems to be a bit of a kettle of fish. To stir it a little more:

  • Connel MacKenzie says that Webster's 1913 establishes a precedent of alphabetic order, but this is not entirely true. I spot-checked a few words in the 1913 edition and had no difficulty finding exceptions: "home" has a sequence of n, adj, adv; "tangle" has the verb before the noun (with a bracketed note [From Tangle, v.]); and "eddy" has n, v, adj.
  • Regarding the point that alphabetic order is easier for bots, surely web resources (like all software) should be designed, first and foremost, for the end user's convenience, not for the convenience of maintenance bots.
  • EncycloPetey may be right that there are not overly many English words which have senses corresponding to more than 2 parts of speech, but the situation where nouns are used attributively as adjectives is not uncommon and it seems to me that the editors of contemporary dictionaries are increasingly disposed to include definitions for the attributive adjectives. In these cases it strikes me as illogical in the extreme to define the attributive first and the noun of which it is the attributive second. (Note: I admit to being a little foggy on exactly how to tell a noun used attributively from an attributive adjective; the OED sometimes uses the mysterious phrase "attrib. passing into adj.")
  • Should there be a written POS policy and not an unwritten "decree"? Surely the answer is "yes".

-- WikiPedant 15:23, 22 May 2007 (UTC)[reply]

Regarding your third point above: we don't have to follow their lead. Latin dictionaries routinely have to address the substantive sense of adjectives, and do so by marking the entry as an "adjective" with the note (substantive). We can do the same thing under a noun header using (attributive). Then it all stays under a single POS header and we don't have to fret so much about order of section headings. --EncycloPetey 15:36, 22 May 2007 (UTC)[reply]
Got to agree
- Make is most useable for humans. The bots (and the nerds) should be considered after the "Average Reader".
And have my usual guffaw about people asserting something is Policy (or a Decree) without it actually being in a Policy Page at all. Same for "standard practice"--Richardb 04:35, 23 May 2007 (UTC)[reply]
You'll note that I carefully did not say it was my decree (it wasn't.) It is with some amusement that I found Wiktionary:Beer parlour archive/July 06#Order for words as English.2C Portuguese.2C Spanish (which has a similar complaint about not finding a link for what then was common practice for well over a year.) Note the comment from (at that time) my adversary. Also, I found Wiktionary talk:Entry layout explained/archive 2005#quickie which seems to be the first time it was formalized in any way. Note that both of those archive pages discuss the "possibilty" of official policy pages. --Connel MacKenzie 05:33, 23 May 2007 (UTC)[reply]
And you thought the Vogon Constructor fleet (Hitch Hikers Guide to the Galaxy) was being obscure when they posted their notices about the forthcoming demolition of Earth. But of course we should have looked in those places to find current policy. :-) If it ain't in the current policy pages, it ain't policy. --Richardb 05:50, 24 May 2007 (UTC)[reply]
You know perfectly well your statement is false. The existing policies do not cover all of the standard practices of Wiktionary. Yes, it is a laudible goal to have complete policy coverage, but still a long way off. Now before you go comparing me to some Vogon, I think you should read some of my poetry... --Connel MacKenzie 06:09, 24 May 2007 (UTC)[reply]
Sorry Connel. I know perfectly well that the statement "If it ain't in the current policy pages, it ain't policy. " is a plain and simple fact. If you assert anything else is policy, or "standard pracice", you are just showing how much you really disregard the idea of policies. Because unless you can point to a current policy page supporting your argument, you are then relying on your memory of something that may hve been discussed years ago. Which, no matter how good your memory is, you can hardly expect new users to know about. Or are they expected to trawl through every archive of every discussion ever to see if something was discussed, and resolved, in the way you now choose to police it. I'll stand by my simple statement "If it ain't in the current policy pages, it ain't policy. ". Applying any other "rules" is just autocracy.--Richardb 13:44, 24 May 2007 (UTC)[reply]
So, Richard, on which policy page does it say, "If it ain't in the current policy pages, it ain't policy."? --EncycloPetey 16:30, 24 May 2007 (UTC)[reply]
Oh, my two cents on the matter: We should not have any "part of speech" headings; only a single "definitions" heading, with parts of speech identified at the start of each line. --Connel MacKenzie 05:33, 23 May 2007 (UTC)[reply]
And how would you organize the various inflected forms in this scheme? --EncycloPetey 08:03, 28 May 2007 (UTC)[reply]

List of Mirrors?

Do we have any list of mirrors somewhere? Just found [1] which seems to merge several dictionaries, including Wiktionary... which however is referred to as "The Nuttall Encyclopedia".... Compare e.g. [2] with [3] or [4] with [5]. \Mike 12:23, 20 May 2007 (UTC)[reply]

That is a curious misrepresentation. Confer: w:The Nuttall Encyclopædia. I don't see links to, nor GFDL mentioned. Someone care to draft a take-down letter?
I only recently started this list, but haven't figured out where it belongs officially. --Connel MacKenzie 14:54, 20 May 2007 (UTC)[reply]
Pedia used to have a list of "forks and mirrors" - what is the difference? SemperBlotto 14:57, 20 May 2007 (UTC)[reply]
They don't list Wiktionary mirrors, in general, IIRC. --Connel MacKenzie 15:58, 20 May 2007 (UTC)[reply]
w:Wikipedia:Mirrors and forks, for anyone who wants to take a look. It's strictly for mirrors of the English Wikipedia. ~MDD4696 01:09, 22 May 2007 (UTC)[reply]


How do y'all feel about the categories -able, -al, -er, -hood, -ize, -ness, -or, -scope, -ship, and -y? —RuakhTALK 16:11, 20 May 2007 (UTC)[reply]

They're a bad idea. I've nominated the two I found for deletion and received support from Beobach972. The category names simply do not decribe the contents. Are they English, French, mixed? Are they nouns, verbs, mixed? It's a bad idea to start categorize words by endings this way. --EncycloPetey 20:36, 20 May 2007 (UTC)[reply]
Thanks. Seeing as several editors have expressed feelings that they should be deleted; seeing as no one besides their creator has spoken to defend them; and seeing as their creator has refused to defend them beyond linking to w:Wikipedia:Categories, lists, and series boxes (a guideline that wouldn't specifically apply to this case even if it were a Wiktionary guideline), I'm going to go ahead and delete them. —RuakhTALK 20:46, 20 May 2007 (UTC)[reply]
Note that the links are via templates, so it should be easy to unlink the categories. --EncycloPetey 20:51, 20 May 2007 (UTC)[reply]
Yeah, done, thanks. (References to Category:-scope were actually manual, but there were only two, so not a problem.) —RuakhTALK 21:00, 20 May 2007 (UTC)[reply]
The guy was trying to solve a problem, making a suggestion for improvement, another useage. Seems to me we were mighty hasty in deleting these attempts, without spelling out any alternative approach.--Richardb 04:25, 23 May 2007 (UTC)[reply]
Trying to solve what problem? I never saw any discussion of a problem that existed. I never saw the creator of these categories participate in the discussions regarding deletion or the discussion here. I never saw any reason given for the categories to exist or any description of what these categories were supposed to (and not supposed to) contain. No problem seems to have existed prior to the categories, but the categories were badly named and this created a problem. --EncycloPetey 04:42, 23 May 2007 (UTC)[reply]
"Seems" can be deceiving. :-)   Two editors (I and one other) asked him about Category:-scope on his talk page, suggesting alternatives (I suggested we use -scope#Derived terms; the other suggested we use an appendix). His sole reply was to link to w:Wikipedia:Categories, lists, and series boxes, not even quoting any particular part that he thought relevant; and when pressed to be more specific about the purpose of Category:-scope, he did not reply. (This was about a week ago, and he made plenty of contributions between that discussion and the deletions, so it's not like he didn't have a chance.) I'm all for editors' being bold, but when other editors think there's a problem with bold changes, I think they have an obligation to either (1) try to offer some sort of justification or (2) acknowledge that they can't. —RuakhTALK 07:02, 23 May 2007 (UTC)[reply]
Fair enough. You've explained it now. "Seems" there was a bit more to it than just "It's a bad idea to start categorize words by endings this way.", which was the only stated reason for wanting them deleted.--Richardb 13:34, 24 May 2007 (UTC)[reply]

Location of lemma for Category:Korean verbs

Should the lemma entry for a Korean verb be a the usual location, e.g. 하다 (hada), or at the base of the verb (e.g. , ha)? We currently have a mix, e.g. we have the lemma for 하다 is at the stand location, and that for 막다 (makda) at the stem entry (mak). Rod (A. Smith) 03:58, 21 May 2007 (UTC)[reply]

Wiktionary:About Korean uses the form ending in 다 for its example. That might be worth making explicit, though. —RuakhTALK 05:31, 21 May 2007 (UTC)[reply]
Right. I probably should have mentioned that I created most of WT:AK mere hours ago. I guess it's obvious where I think the lemma belongs. ;-) I'll be bold unless anyone objects to standardizing on the traditional -다 (-da) form. Rod (A. Smith) 07:17, 21 May 2007 (UTC)[reply]
I would support that; it's certainly the standard form in Korean dictionaries. The stem form probably merits a mention (especially if it's already included), but certainly the primary coverage of a verb should be at the "dictionary form" (가다, 하다, 막다 등). -- Visviva 09:21, 21 May 2007 (UTC)[reply]
Oh! O.K., I've fixed the {{policy}} tag to reflect that. —RuakhTALK 15:49, 21 May 2007 (UTC)[reply]
Oops. Thanks. Rod (A. Smith) 15:57, 21 May 2007 (UTC)[reply]

Should Category:Idioms be emptied

Category:Idioms seems to be used concurrently with Category:English idioms for categorizing English idioms. From the subordinate other-than-English language categories I surmise that there exists an unresolved dilemma of whether idioms should be considered part of speech or hierachically in the *Topics hierarchy under linguistics. Trying to grasp this paradox became a little too much for my head at the moment. Maybe someone else with more cognitive power can solve this puzzle for me? __meco 16:12, 21 May 2007 (UTC)[reply]

I think you've succinctly explained the problem already. We haven'\t decided whether Idiom should be used as a POS header. Perhaps we need a vote? At least a discussion, perhaps. --EncycloPetey 16:34, 21 May 2007 (UTC)[reply]
My vote would be no. Some idioms act as nouns ("ace up one's sleeve"), others as adjectives ("ahead of one's time"), etc. We do need some sort of POS header for the idioms that act as full clauses ("the jig is up"), but "Idiom" is not a very good header for these, firstly because it wouldn't include all idioms (since some would go under "Noun"/"Adjective"/etc.) and secondly because there's no reason to assume that idiomaticity will always be the sole criterion for including such clauses — for one thing, it already seems to have been informally decided (in an RFD discussion over "apple pie") that we should sometimes include non-idiomatic senses of often-idiomatic phrases, and certainly it wouldn't make sense to place these senses under an "idiom" header. (I suppose one could argue for separate "Idiomatic clause" and "Non-idiomatic clause" headers, but that strikes me as silly.) Unfortunately, I can't think of a better one — "Adage" and "Proverb" seem too specific (they refer to statements that are taken to be always true, which "the jig is up" is not), but "Expression" and "Turn of phrase" seem too general (they could refer to any idiom), and "Clause" just seems silly. Maybe "Saying"? —RuakhTALK 17:00, 21 May 2007 (UTC)[reply]
WT:ELE says it's a standard non-POS header though. Related headers are "phrase", "proverb" and "phrasal verb". The last one is a bit mysterious, as "verb phrase", as well as "noun phrase" and "nominal phrase" are listed as deprecated, and there's no "phrasal adjective" either. Some kind of clarity in all this would indeed be good. -- Coffee2theorems 17:02, 21 May 2007 (UTC)[reply]
Very true. Some parts of the ELE are out-of-date now. We have a draft list of part of speech headers currently in use at WT:POS, and some of them have been ruled as accepted/rejected, but there are some on the list that are in a state of limbo. Idiom is one that was generally considered acceptable to those who set up the draft, but I think it is closer to being in the limbo category at this point. --EncycloPetey 17:33, 21 May 2007 (UTC)[reply]
To me, it is important to distinguish whether or not a phrase is an idiom. If it is an idiom, I think it deserves an L3 heading, not necessarily because it is a part-of-speech in the narrowly defined sense, but because I want to draw attention to the fact that the word or phrase is not to be taken for its literal meaning. To me, the significant piece of information is that jump through hoops is an idiom, not that it is a verb or verb phrase. If I'm a student of the language, I probably already know that jump is a verb, and that hoop is a noun etc. -- A-cai 22:53, 21 May 2007 (UTC)[reply]
Wiktionary is a dictionary. It is reference work for the general reader, not for the "student of the language". In any case, we can distinguish idiomacy without causing inconsistent headers. jump through hoops' current version, with ===Verb=== as the header and a {{idiom}} context tag, is a good example of this, and is much preferable to the reverse. Dmcdevit·t 06:29, 22 May 2007 (UTC)[reply]
WT:NOT doesn't say that Wiktionary is not for language learners, should that be included? I don't see any point in limiting the audience that way. -- Coffee2theorems 14:28, 22 May 2007 (UTC)[reply]
Quite. I agree. We need to make sure our dictionary is useable for people besides ourselves (in that people unfamiliar with Wiktionary can understand it easily) and useable for all non-linguists (in that any people wondering what a word means can have that question answered). Dmcdevit·t 00:20, 23 May 2007 (UTC)[reply]
A-cai — I don't understand that line of argument. What kind of crazy language learner would bother looking up jump through hoops if he doesn't already suspect that it might be idiomatic? And what kind of crazy language learner who knows that jump and hoop mean (and knows that in this context jump is probably a verb and hoop is probably a noun, though with idioms it's not always easy to tell) would look up jump through hoops, read the definition (even if it weren't tagged (idiomatic), not that I mind that it is), and fail to realize that this isn't the literal sense? Call me a hopeless optimist, but I think our readers are smarter than that. (By the way, you argue that someone who understands jump and hoop can tell that jump through hoops is a verb phrase; fair enough. But that's not a general property of idioms; anyone who understands shoot, 'em, and up can wrongly "tell" that shoot 'em up is a verb phrase as well.) —RuakhTALK 15:58, 22 May 2007 (UTC)[reply]
I agree with you in the case of English entries on Wiktionary. However, I doubt that the general reader would be familiar with any of the 36 entries in Category:Spanish idioms! I have personally created 450+ entries for Category:Mandarin idioms, none of which are intended for a general reader as you define it. A fluent Mandarin speaker probably already knows the definitions for these, and the general English reader (i.e. non-Mandarin speaking) could probably care less. That leaves us with the people who are studying Mandarin. I don't object to how Wiktionary treats English idioms, but do we really need to go out of our way to force English grammar POS rules onto the other foreign languages? -- A-cai 10:46, 22 May 2007 (UTC)[reply]
I don't see that this is English-specific. E.g. I added the Japanese idiom 猫の手も借りたい a few days ago. It is used and inflects like an adjective, because the last word (借りたい) is an adjective. Sure, not all the possible inflections are in use or even sensible, but that has more to do with semantics than syntax. I don't know Mandarin so I can't say how things work there, but in my mind POS is more of a syntactical classification system than semantical, and "idiom" is purely semantical. I'm no linguist though.. -- Coffee2theorems 18:47, 22 May 2007 (UTC)[reply]
The "general reader" does not mean someone who speaks no languages. I'm thinking of someone who should not be expected to understand linguistic complexities. Someone looking up a Spanish or Mandarin idiom should still get information accurate to that language, but I fail to see how the current jump through hoops is inadequate for what you are describing. Dmcdevit·t 00:20, 23 May 2007 (UTC)[reply]
Actually, your example proves my point about how confusing it will be for people, if they have to assign precise grammatical categories to idioms (especially in foreign languages which they are not fluent in). 猫の手も借りたい (neko no te mo karitai) literally means one even wants to borrow the paw (hand) of a cat. You're example sentence would therefore literally translate to At the end of the year, nori producers are so busy that they even want to borrow the paw of a cat. karitai is not an adjective, but a verb to want to borrow. -tai is a verb ending, which when added to the end of a verb, means to want + verb. kari- is the stem form of the verb kariru to borrow. In the example sentence, the idiom itself modifies isogashii busy. Therefore, the idiom is used as an adverb which modifies isogashii. The -hodo particle has the force of so + busy ... as to + want to borrow a cat's paw. In the example sentence, the idiom is used adverbially (not as an adjective), but is not itself an adverb. If anything, it's almost a sentence unto itself. My point is that it's really easy to get confused by this stuff. I think the most important thing is that you know what the individual words mean, and that the phrase as a whole is an idiom. If you want further details about the grammar, each word in the quote should theoretically be linked to its root word. Therefore, in the case of karitai, I would recommend linking as follows: [[kariru|kari]][[-tai|tai]] (or the kanji/kana equivalent, I'm using Romanization here for the benefit of the non-Japanese speakers). -- A-cai 22:50, 22 May 2007 (UTC)[reply]
I classified it as an adjective because English idioms are also classified according to how they're used syntactically, and the closest class for that term as a unit was adjective in my mind. Maybe just "phrase" is better.
The classification of the part "karitai" depends, but 国語 grammar (which WT:AJA says is used for determining POS) AFAIK doesn't classify it, it's just the continuative form of "kariru" + the auxiliary verb "tai", i.e. two words like in "may go". Using its JSL grammar classification, verb form (= verb at wikt) seems reasonable to me if we make an entry for it.
More interesting are the cases that aren't phrases. "ikenai" and "naranai" for instance. They are both originally from verbs, but that's just the etymology. I'd say they're adjectives that have a few extra forms (ikemasen, narimasen, naranu, naran), but maybe others disagree. The only classification I've seen for them is "compound" (連語). -- Coffee2theorems 06:27, 23 May 2007 (UTC)[reply]
I should add that in the case of Mandarin, a large number of idioms (some of which have been exported to Japanese) have been in continuous use for several thousand years! I don't think they cared that much about assigning words to specific parts of speech in ancient China. In classical Chinese, it is often true that a word can be used as whatever POS which suits the purpose of the author of a given sentence. I know this drives Western linguists nuts, but Zhuangzi probably didn't stop to consider that 2000+ years later, someone at Wiktionary would be trying to grammatically parse his sentences. In fact, given his outlook on life (that each person views reality subjectively), he would probably find us an odd bunch :) -- A-cai 23:07, 22 May 2007 (UTC)[reply]
That sounds like a general argument for re-evaluating how we assign parts of speech to Classical Chinese words and idioms, and a general argument for being especially careful when writing entries for languages we don't speak fluently (though in my experience we tend to have more serious errors in entries that are written by native speakers, who have a great intuitive understanding of the language but little formal/conscious understanding of linguistics), but it does not sound like an argument for using "Idiom" as a "part of speech" header in languages where words and idioms do belong to fairly consistent parts of speech. If you're bothered by the notion of imposing English-appropriate part-of-speech headers on Classical-Chinese entries, why do you think it's a good idea to impose Classical-Chinese-appropriate part-of-speech headers on English entries? —RuakhTALK 23:34, 22 May 2007 (UTC)[reply]
Perhaps you misunderstood my position. I'm not suggesting imposing anything on any language. I'm suggesting that we do what is appropriate depending on the needs of the language in question. If that means that English uses POS information in the L3 header and idiomatic in the definition line, but that other languages use ===Idiom=== as an L3 header, then so be it. In my opinion, one size does not fit all for every language. If it did, we wouldn't need separate policies for each language (WT:AJ, WT:AC etc.). I understand the need for consistency, but languages are not like mathematics. I have not verified this, but I think that it is safe to say that almost every language has a rule that is broken as often as it is followed :) -- A-cai 02:49, 23 May 2007 (UTC)[reply]
In response to the original heretical statement about the category: no. The category itself should not be eliminated. The idiom category itself is a tool pretty much intended for English-language learners. Native speakers would have to be pretty bored to go rummaging through that category.  :-)
In response to the side conversation that has ensued: I'd object (again) strenuously to reducing the ===Idiom=== heading's role without very specific tests. Even then, I don't see why the part of speech can't be supplied for senses, as appropriate. (I think they are now, generally.) But the utility of the heading "Idiom" is then more valuable, than some effort to eliminate it for no particularly compelling reason whatsoever. --Connel MacKenzie 04:37, 23 May 2007 (UTC)[reply]
Also of note: {{idiom}} currently is putting a small portion of these in the wrong category. --Connel MacKenzie 22:22, 23 May 2007 (UTC)[reply]


Since we're discussing Wiktionary's official parts of speech here, I thought I'd mention the Wiktionary talk:About Korean#Principal parts conversation, where we seem to be settling on "determinative" at the best English translation of the Korean part of speech called 관형사 (gwanhyeongsa). Unless anyone objects to "determinative", we'll replace Category:Korean adnominals and Category:Korean attributives with Category:Korean determinatives. Regarding use of the header "===Determinative===", WT:POS#Other headers in use says this:

These headers are usually language specific. [...] Other headers in use may be added to this table regardless of the warning above note to modify this policy page without a vote [...]

So, is it OK to add "determinative" to the table in WT:POS#Other headers in use and to begin using it or should that be approved here first? Rod (A. Smith) 19:46, 24 May 2007 (UTC)[reply]

That could be confusing, since we already have Determiner as a POS header in use. If they're synonymous, I would use Determiner instead to avoid that confusion and maintain consistency. --EncycloPetey 19:52, 24 May 2007 (UTC)[reply]
Unless there is some reason why Determiner is wrong (and it is the definition at 관형사 ;-), we should certainly be using that, it is used elsewhere. (We should have a table in-between the two standard ones and "others in use" in WT:POS that is specifically the standard headers used by specific languages.) "Determiner" is also more familiar; I'd never heard "determinative" (in the context of grammar) until now. Robert Ullmann 14:45, 28 May 2007 (UTC)[reply]
I'd heard both, and we had a discussion about which is "correct". Determinative is the term used in the Cambridge Grammar of the English Language, but apparently Determiner is the usual term favored in all the other literature. --EncycloPetey 15:30, 28 May 2007 (UTC)[reply]
To elaborate: the Cambridge Grammar of the English Language (often considered the most definitive grammar of English yet) uses "determiner" for the syntactic role, and "determinative" for a word whose primarily function is to serve that role; hence "Mary's cat's" is a determiner, but not a determinative (since it's not a word). By my understanding, most of the literature uses "determiner" for both, but the CGEL is apparently not alone in using the term "determinative"; at least, one of my French grammars says that les déterminants are sometimes called les déterminatifs. But the CGEL's definitiveness notwithstanding, I think we're better off with the more common terminology (especially since the CGEL is very much specific to English). —RuakhTALK 07:06, 29 May 2007 (UTC)[reply]

Wikipedia-Wiktionary Co-op bot

Look at and tell us what you think, 18:00, 22 May 2007 (UTC)[reply]

Sounds just like my Wikipedia frequency counts thing from a couple years ago. Note the revision history - mostly abandoned now. Keeping synchronized with the latest book additions to the Project Gutenberg corpus seems to be much more fruitful (which is why I focus there, now.) I think you'd be surprised at how few of those terms are missing on Wiktionary, and of those, how few have no entry in the deletion log.  :-/ --Connel MacKenzie 18:34, 23 May 2007 (UTC)[reply]

Sardinian (Campidanese)

I notice that we have a lot of entries with this header. Should they use Sardinian or Campidanese instead? — Beobach972 01:54, 24 May 2007 (UTC)[reply]

Hmmm. w:Sardo campidanese? Well, at least w:Sardinian language has an ISO 639-1 code. It indicates that "sro" is "Campidanese" but that doesn't even have a Wikipedia language description page lurking about (something of a first, in my recollection.) --Connel MacKenzie 06:19, 24 May 2007 (UTC)[reply]
Campidano is part of Sardinia, and the natural adjective (and noun for an inhabitant) would be campidanese. There are several g.b.c. hits in Italian for this word that refer to it as a dialect. SemperBlotto 08:39, 24 May 2007 (UTC)[reply]
The natural adjective for a person from New York is a "New Yorker", and that certainly is described as a dialect, but we don't include that (nor "New Yorkese") as a language heading. I'm confused. Do we want "Campidanese" as a language heading here or should they be "Sardinian"? --Connel MacKenzie 17:22, 25 May 2007 (UTC)[reply]
This may be similar to Occitan, discussed above. Perhaps we could give them the header ==Sardinian-- and the context tag (Campidanese). — Beobach972 17:03, 26 May 2007 (UTC)[reply]
That's similar to what I've seen done for dialects of Croatian. --EncycloPetey 18:08, 26 May 2007 (UTC)[reply]
From my limited experience, the variation between different dialects of Sardinian is a bit more extreme than Occitan; however, it seems that the same sort of ideas ought to be used. How much work is there to be done in regards to clean-up? I think I could lend a hand. Medellia 22:19, 26 May 2007 (UTC)[reply]
At the latest count, we have 39 articles with a language header of "Sardinian (Campidanese)". It ought to be a simple matter to find them all using the Advanced Search feature on Google to search for an exact match. This search turns up also a number of translation tables with the same parenthetical information. --EncycloPetey 22:26, 26 May 2007 (UTC)[reply]

I am currently in the process of tidying up Category:Romanian words inherited from Latin and have found many references to Popular Latin. I assume this means Vulgar Latin? Just wanted to check before I plough into any more tidying up. --Williamsayers79 07:38, 24 May 2007 (UTC)[reply]

FYI the category is being moved to Category:ro:Latin derivations in line with all other foreign language etymology categories. I am applying Etymology language templates where applicable and making note of the inherited status in the etymologies.--Williamsayers79 07:38, 24 May 2007 (UTC)[reply]
I took a look at a few of the entries in that category and would stand by your supposition. I wouldn't, however, generalize to the point of saying that "Popular Latin" is a direct substitute for "Vulgar Latin" in all cases. If you have any problems with the Latin, I could try to help out. Best of luck! Medellia 22:19, 26 May 2007 (UTC)[reply]

Requests for verification

So for the past few days I've been among those cleaning out old entries on Wiktionary:Requests for verification that have been languishing for months, and one thing I've noticed is that people seem to be more interested in citing the iffier words. This is understandable — it's no fun citing a word when you don't have to dig at all to find the good cites — but it's counter to the purpose of RFV, since it means that of the CFI-satisfying words that are brought to RFV, the words that are passing are the ones least important to include. This is somewhat mitigated by the fact that some editors, having brought a word to RFV, and being told by fellow editors that said word is actually really common in <region>, will withdraw that RFV (removing the need for quotations), but not all editors do this, and I don't think they should really be expected to.

I'm not sure what can be done about this, besides making a point of encouraging editors who post "Keep" messages, and who say that a given word is easily cited, to put some cites where their fingers are. It doesn't seem that we can amend our policies to allow words on RFV to pass without cites; our CFI do allow for words in "clearly widespread use", but I don't see any cite-less way to establish that for a word that's been RFV'd (since obviously the editor who RFV'd it doesn't think it's in widespread use).

Any thoughts? Have I been going about this the wrong way? Should I be giving advance notice before marking "RFV failed" words that people have argued for but failed to cite?

RuakhTALK 06:07, 25 May 2007 (UTC)[reply]

  1. Thank you for taking on this enormous task.
  2. If there is general agreement that a term is "in widespread use" on the WT:RFV page's discussion, then yes, it should be rfvpassed without citations. That has been the custom to date. Does the wording at the top of RFV need to be amended to reflect this more clearly?
  3. How much time an entry should have, depends on the amount of controversy surrounding it. If an entry clearly has some who "like" it and some who don't, and citations haven't been provided, a one week warning of impending deletion is "fair."
  4. Given the enormous backlog, and your diligence, I'm not prepared to make a fuss about any of my favorite terms you may have deleted. RFVfailed simply means the citations now have to be dug up, before re-entering those entries. For any that were claimed as "in widespread use" it should be that much easier...
  5. I'm note sure where DAVilla et al., left off with the criteria for idiom forms. My recollection was that there was general agreement that only the most common form of an idiom required citations (particularly when the other forms were redirects.) I haven't checked CFI on that, this evening, but I thought it mentioned that case specifically.
  6. Keep up the great work.
--Connel MacKenzie 07:50, 25 May 2007 (UTC)[reply]

Individual's names

A while back, I noticed some contributors entering country information on the various "Names" Appendix pages. As such information is seemingly harmless, I haven't been watching those edits.

Tonight, I noticed numerous individual's full names have been entered (mostly, for living people.) There is no justifiable way I can see for WMF to take on that sort of liability, in a dictionary.

Do we need a vote for prohibiting mentions of living persons (outside of Quotations sections of entries in the main namespace) or is there some possible justification for this sort of thing?

--Connel MacKenzie 07:37, 25 May 2007 (UTC)[reply]

Nope. Remove (except for cases like Shakespeare, Robin Hood, Jules Verne perhaps). H. (talk) 14:30, 28 May 2007 (UTC)[reply]

It is me who add the "country", "gender", and "individual's full name" information to "Names" Appendix pages. I use the Names Appendix as a kind of notebook for all the names I find in different media (books, TV, newspapers, magazines, Internet sites, the Real Life and from the spam mail I receive daily). My goal is making a complete list of names from all the countries in the world with its description, meanings, etymology, forms, translations, diminutives, social use and so on. When I have add all my notes and the information from all my books about first names (approx. 15 meter) I will make references from the names entries in the Appendix to name articles in Wiktionary. When there is a good and complete reference I will remove the country/gender/full name information behind the entries in the Appendix. By the way: Many of the individual's full names are fake as they are taken from spam mail.

Best regards, Alasdair (

No. Sorry, but even hidden in comments, including people's full names, especially example celebrity's names, doesn't work. --Connel MacKenzie 04:52, 29 May 2007 (UTC)[reply]

Present participles as adjectives

Discussion moved to Tea Room

Cantonese/Mandarin category names

I am trying to fit Category:yue:Numbers into the existing category scheme, however, when I attempt to use the {{nav}} template, I find that there exists no Category:Cantonese language, only Category:Cantonese. I'm sure this is connected to a subject upon which much has been said, however, as I am a complete stranger to the Chinese language, I bring it up here. __meco 07:31, 27 May 2007 (UTC)[reply]

Oh my. I see we still have Category:Chinese language. Perhaps the categories were overlooked during this year's reorganization of "zh" material? --Connel MacKenzie 07:38, 27 May 2007 (UTC)[reply]
At first glance it appears to me that Chinese language is currenlty organized across two axes. This is inconvenient, however, I am oblivious to any ill effects of simply purging one of them (Cantonese/Mandarin I suppose). __meco 07:57, 27 May 2007 (UTC)[reply]
I think there should be a Category:Chinese language as an overall category for Cantonese, Wu, Guo yu, Min nan and so on. Pistachio 14:23, 27 May 2007 (UTC)[reply]
The reason that I named it Category:Cantonese and not Category:Cantonese language is that there is a raging language/dialect debate within the Chinese speaking community (see: w:Chinese_language#Language_versus_language_family). Rather than engage in such a debate here on Wiktionary, I avoid the issue altogether by simply calling the category Cantonese rather than Cantonese language or Cantonese dialect. Therefore, the intention was to have anything like Category:yue:Numbers located in Category:Cantonese. Category:Cantonese can be found in Category:Chinese language because Cantonese is a Chinese language, just like Mandarin and Shanghainese are also distinct Chinese languages. If anyone feels strongly enough about modifying the current scheme, I would recommend creating a separate Category:Mandarin, which would also be under Category:Chinese language (or possibly Category:Chinese languages). I have not done so to this point, because most lay people think of Chinese as being synonymous with Mandarin. -- A-cai 21:18, 27 May 2007 (UTC)[reply]
I was partly bringing this issue up because neither Cantonese nor Mandarin are included in the {{nav}} template (include Shanghainese now that I have become aware of its existence). Could we perhaps include these without risking inciting a hardened conflict? However, without the word "language" added to the category name, this is perhaps difficult from a purely technical standpoint? __meco 07:00, 28 May 2007 (UTC)[reply]

Announcing Occleve - The Open Content Learning Environment

Occleve (the Open Content Learning Environment) is a mobile phone computer based learning system which stores its test data in XML on a mediawiki backend.

Occleve lets anyone create and edit tests on the wiki, then load them into the mobile phone testing software. It's a totally open system: the mobile phone software is GPL-licensed, and the content is GFDL-licensed. As far as I'm aware this is the first time mediawiki has been used like this: as an XML backend serving a mobile phone application.

I live in Shanghai, China so at the moment nearly all the tests are for English-Chinese vocabulary. This explains the system's other name, PocketChinese. But there's nothing stopping you creating tests for other language pairs, such as English-French, English-Arabic, etc, etc... In fact the system already contains a few tests on English-Hindi, English-Korean, and English-Shanghainese.

In the longer term, my ambitions for the system extend beyond language learning. For example, I'd like to add MathML rendering support to the client, so it can be used for learning maths equations.

How does this relate to wiktionary, and doesn't it duplicate it? I think these are the important differences:

  • Occleve is a computer-based learning system, with a focus on testing, and the wiki is specifically designed to act as its backend.
  • The Occleve wiki stores its data in XML, not wikitext.
  • Occleve is not just about learning a language (eventually).

I'd be very interested to hear people's feedback, objections, criticisms, flames, etc, and most importantly please get involved! --Joe Gittings 09:45, 27 May 2007 (UTC)[reply]

PS. This post to the mediawiki-l mailing list gives a few more details... --Joe Gittings 09:53, 27 May 2007 (UTC)[reply]

Correct use of the Category:iw:Parts of Speech

Is this wrong: Category:zh:Parts of speech, and this right: Category:ro:Parts of speech?

This touches upon my inquiry above about the Chinese hierarchies. Is there an alternative guideline for categorizing Chinese entries as opposed to other languages'?

(Btw: I used iw in the header as a placeholder. Is there any precedent here that I'm not aware of?) __meco 12:10, 27 May 2007 (UTC)[reply]

The second example is correct. The Category:Parts of speech and its variants are for words that are names of parts of speech, not for grammatical categorization in general. Most such categories have been cleaned up, but Chinese categories are more difficult. A number of inconsistencies still exist there. --EncycloPetey 23:49, 27 May 2007 (UTC)[reply]

Japanese Proper Names on Kanji Pages

Should examples of given names and such be somewhere other than in the "compounds" section? I think they should still be on the kanji pages, but wouldn't it be better to put them in "Names," "Places," "Geography," etc. sections? I think it would be a little more organized to do it this way. --Hikui87 21:12, 27 May 2007 (UTC)[reply]

It would be under ===Proper noun===, which is the same way we handle English given names (take a look at any Tom, Dick or Harry :). -- A-cai 21:24, 27 May 2007 (UTC)[reply]
That wasn't the question asked ... :-)
Trying to subdivide the compounds listed for a given character gets very messy very quickly. There are some entries that do subdivide by the pronunciation/reading of the character in the given compound. If one breaks them down into lots of sections, it becomes harder, not easier to find than if they are just listed in standard order. Mostly someone wants to find a link for an entry they want to look at. Robert Ullmann 14:30, 28 May 2007 (UTC)[reply]
Based on Robert's response, I may have misunderstood the question. I was thinking that you were refering to a situation where a single kanji character can also be someone's given name. An example of this would be Takeshi Kaneshiro from the movie House of Flying Daggers. His given name of Takeshi () is a single kanji character. So on the page, you would have:


===Proper noun===
  1. a male given name
I now think the question had more to do with given names that are longer than one character. If this is the case, I think ===Derived terms=== is as good as anything. I don't think separating them out according to category would be particularly helpful. I agree with Robert, a simple phonetic ordering is usually the best way. -- A-cai 18:54, 28 May 2007 (UTC)[reply]

Look at this page "" and see if the Japanese section looks good or bad. I would like to point out that I did NOT add all those compounds, but I did transliterate them. Personally, I think there shouldn't be so many on one page. Some of the compounds that were there were nothing but names (some weren't even words). The thing is that names are special words anyway. We may say "Use the John," or "Jimmy the lock," but those are by far exceptions. They shouldn't be grouped together. If nothing else, just remove them.

It also needs a better way to show gender. One solution could be to have just the names in kanji on the page, and then link each name to its own page where each transliteration and gender is given.--Hikui87 04:14, 29 May 2007 (UTC)[reply]

It certainly does not look good. Sometimes content should be preferred over looks, but I agree that not here. It's not an index or category page, there's no need to list every possible compound. FWIW, the transliterations make it look worse. I can see that there's some point in having the romanizations (in a list one click away may be too far away), but the kana versions are just clutter. It might be interesting to use JAruby with the romaji in such lists (A-cai made this typographically heretic suggestion for another purpose earlier :-), e.g. 水曜日 (suiyōbi) or 水曜日 (suiyōbi) (no need to link to the romaji version, as it's not the main entry)(update: the ruby link option is gone now). -- Coffee2theorems 05:44, 29 May 2007 (UTC)[reply]
Reading WT:AJA, the compounds list might actually be(!) intended as a comprehensive list. It's not impossible to do that, but a list of the most common 50 or so would be more useful. Counting kanji occurrences in edict entries (total 120138 entries in my version, no names or jargon) with a quick script, the ten most common are (1736), (1640), (1279), (1205), (1175), (1122), (1092), (1054), (1051) and (1036). I think such lists are big enough that they merit their own page, but collapsing them like at at least doesn't affect the readability of the page. Simple collection of all possible words is work for a bot though, no need to waste the time of humans on it. -- Coffee2theorems 06:44, 29 May 2007 (UTC)[reply]
You might also consider doing something like:
A counter argument might be that you want red links on the page so that people will know to add them. My counter-counter argument would be that it is better to place requests for new entries in a more centralized location such as Wiktionary:Requested_articles:Japanese. -- A-cai 07:14, 29 May 2007 (UTC)[reply]

I agree that it doesn't look good. I was also going to suggest keeping compound examples to <50. Look at for a better version of that system. Also look at and for two extremes of compounds usage. I tried to meet somewhere in the middle. Either way, a more standardized system is needed.

I can see that taking the kana entries out might help clean it up a bit, but they're important entries too. Using the example from earlier, 水曜日 is proper Japanese, but すいようび is by no means incorrect nor uncommon usage.

I like the collapsable compounds idea, as it allows for thoroughness yet keeps the page clean. I think this would also allow both kana and romaji versions to be there, though the definitions as in is still too much.

Another idea to keed compound entries more concise would be to not add a compound or its transliteration unless a page already exists for the entry. Pages for other compounds can be requested or created by user before adding them to the page.--Hikui87 16:46, 29 May 2007 (UTC)[reply]

The kana versions are there as a reading guide, and unnecessary one at that as there's already romaji. If the kana versions were there for some other reason, then they should be given separate items on the list, but that wouldn't make sense as then you'd be listing spellings of words that don't contain the kanji itself (e.g. there's no 水 in the spelling すいようび). Also, there's no need to include more than one spelling of a word, especially ones not containing the kanji in question. E.g. if the kanji is 日, then there might be some point in listing 日差 and 日射し in addition to 日差し, but 陽射, 陽差し and ひざし are quite unrelated to 日. The extra spellings are just one click away anyway.
Collapsable lists can hide a mess, but they won't make it go away, so I wouldn't use them as a reason for using -style lists. It might work if you replaced e.g. "大小 (だいしょう, daisho) various sizes; large and small; a pair of swords worn together" with something much shorter such as "大小 (daishō) large and small" etc., but then you'd have extra work in choosing a single, core meaning, which is a bit much for simple lists of compounds (I think such decisions should be left to the entries themselves).
Postponing the addition of compounds to lists until the entries exist is not a permanent solution, nor do I think it's really a good one. A compounds list which doesn't list common compounds is simply bad no matter what the reason, and making their improvement more difficult (i.e. you have to create the entries first) does not sound like a good idea. People (especially volunteers and passers-by) are lazy, and if they have to do more work than they're inclined to at the moment, they'll just go away without doing anything at all. -- Coffee2theorems 21:27, 29 May 2007 (UTC)[reply]

Yeah, that makes sense. After reading WT:AJA a little more closely, while any compound that includes the kanji or kana can and should be included, having romaji forms of kanji compounds doesn't seem to fit that description. It does say that all forms of the term; kanji, kana, romaji, and English translation, or all that apply; should be included, but it probably just applies to the entry itself.--Hikui87 02:41, 30 May 2007 (UTC)[reply]


Based on a discussion that started on my user talk page, I've moved a discussion about format of the Descendants section to a starter page for Wiktionary talk:Descendants. A key issue is how to distinguish direct descendants from those resulting from borrowing. --EncycloPetey 15:53, 31 May 2007 (UTC)[reply]