Open main menu
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit


Romanized Korean, again

Hunmin Jeongeum, by the letter of the law at least, has failed RFV. We normally give entries more than a month as people don't spend that much time trying to cite them, that and the backlog's so big that it takes two months just to get to an entry. But, this entry does seem totally unattestable. For fairness it would seem fair to RFV all Romanized Korean before adding to WT:AKO that we don't accept Korean in Latin script. My practical side says that's just a waste of time; delaying the inevitable. Thoughts on this specific issue?

FWIW Tatar has a similar issue where it's Latin forms, which are listed in the Russian and Tatar Wiktionaries don't seem to be attestable, or at least not very easily. That's a separate issue; let's deal with Korean first and have a separate discussion on Tatar later if we want to. Mglovesfun (talk) 13:33, 2 May 2011 (UTC)

IMO, a romanized form should only be given an entry if a) the language is sometimes actually written in Latin script for non-pedagogical communicative purposes (i.e. no language textbooks or language-learning Usenet groups), and of couse b) the specific term is attestable to the usual standards. So unless there is some hidden subculture of romanized-Korean-users somewhere, I would agree that this should be added to WT:AKO and RFV bypassed. -- Visviva 03:10, 9 May 2011 (UTC)
FWIW yes Latin terms do creep into non-Latin languages in the original Latin script. Usually product names in my limited experience; I used to have a bottle of shampoo where the product name was written in the Latin script and everything else written in Greek script (for Greek, that is). So, if there are no objections in the next 24 hours, I will start removing these, but not PC방 whch doesn't seem to be a transliteration, just a 'borrowing' of the English word PC. --Mglovesfun (talk) 12:57, 10 May 2011 (UTC)

British English Portal

Is there a way to access the site via a British English portal. I want students to use it but I don't want them pulling up US spellings for example. I guess they should be trained to look to see where the word is being used, but the local spelling does identify variants where they exist? If this is not the case, would it be possible to build this functionality in? —This unsigned comment was added by (talk) at 07:41, 4 May 2011.

Sorry, but I don't think that what you are suggesting is feasible. Certain entries, splendour for example, say that they are alternative spellings of the US spelling (for editing convenience: you only need to make changes in one place) so your students would have to visit the US spelling anyway. For some other entries, it's the opposite way around and some don't do this at all. —Internoob (DiscCont) 22:31, 4 May 2011 (UTC)
Your students need to know that the spelling of English words varies around the world (and to learn not to worry about it too much). In this Wiki people use the spelling that they find most natural; changing a (valid) spelling in an existing article is normally frowned upon. SemperBlotto 07:10, 5 May 2011 (UTC)

blocked users' editing their own talkpages

A new feature added to enwikt recently is that a blocking admin now has the option of allowing the blocked user to edit his talkpage. A few points:

  1. We don't have policy on whether/which blocks should allow the blocked user to edit his talkpage. Do we need such policy, or can it be left to the blocking admin's discretion? (IMO, the latter.)
  2. The enwikt practice has always been that blocked users cannot edit their own talkpages. But the default block setting (I think) is that they can: the blocking admin has to specifically click a checkbox to block the user from editing his talkpage. IMO this default should be changed so that the default is in line with enwikt practice. Is there a way, technically, to change the default? (However, if the answer to my previous question is that people think that blocked users should generally be allowed to edit their own talkpages, then this point is moot.)
  3. If someone blocks a user and allows him to edit his own talkpage, then the user might do so to protest the block. Therefore IMO either (a) the blocking admin should, for each such block, watch the blocked user's (talk)page or (b) we should have a message appear, somewhere any blocked user can see it, that indicates that no one necessarily is watching his talkpage.
  4. (This one's minor, and just to satisfy my curiosity.) Anyone know why it was added? Was there a bug report filed for this feature? I haven't heard any outcry for it, and have seen people's saying that it's not necessary. (OTOH I don't recall people's saying it's a bad thing.)

Thoughts?​—msh210 (talk) 18:31, 5 May 2011 (UTC)

I like the principle that blocked users can edit their own talk page, gives them a reasonable chance to protest. But if they abuse it, use it for insults, spam links, etc. they should forfeit that right. --Mglovesfun (talk) 18:57, 5 May 2011 (UTC)
That is good to know, even though it doesn't answer any of my four points.​—msh210 (talk) 19:43, 5 May 2011 (UTC)
I agree with what Mglovesfun said. --Daniel. 16:07, 6 May 2011 (UTC)

That's partially my fault, see meta:Requests for comment/Blocked users and talkpage access. IMO, we can always write some javascript to have the checkbox always automatically checked to restore the default. TeleComNasSprVen 08:08, 6 May 2011 (UTC)

Sounds good to me. AFAICT that would require adding the following code to Common.js:
if (wgCanonicalSpecialPageName=='Blockip' && document.getElementById('wpAllowUsertalk') && !document.getElementById('wpAllowUsertalk').disabled) {
Those who actually know JS are encouraged to review that code, though.​—msh210 (talk) 16:50, 6 May 2011 (UTC)
And I see Yair has; thanks, Yair.​—msh210 (talk) 16:53, 9 May 2011 (UTC)

the older spellings…

¶ On the subject of alternative spellings: is it acceptable if we may re‐locate the bulk of content to the etymologically consistent form? I have been doïng this for a little while (example) but I have not yet received any warning or block. Is this considered “correcting” the spelling? --Pilcrow 08:35, 6 May 2011 (UTC) ¶ Additional: I also changed a couple spellings in while I was locating content from civilize to civilise; compare [[1]] with [[2]], but I intentionally did not change the other instances of the word. Will I be blocked? --Pilcrow 08:42, 6 May 2011 (UTC)

I think we have to be careful with statements like "etymologically consistent". I assume in this case it's because criminalise and civilise are from French, and they only have -iser and not -izer. Etymological correctness isn't part of CFI - hence pænal even though the Latin is pœnalis. We are descriptive not prescriptive (WT:NOT). The wider issue here, unresolved for a long time, is what to do with alternative spellings like civilise/civilize when they are both very common. --Mglovesfun (talk) 08:55, 6 May 2011 (UTC)
Oh God, here comes the English vs American issue again... TeleComNasSprVen 08:58, 6 May 2011 (UTC)
IMO, we should eventually have the possibility of synchronized entries, where editing either spelling's entry causes an identical edit to instantly happen to the other, without having it "based" in either entry. --Yair rand 08:59, 6 May 2011 (UTC)
Pilcrow, please stop. There are a few things to say about this. Firstly, "-ise" spellings are (apart from a few words) not used in the US, and therefore regardless of etymology this falls under a general US/UK distinction, on which the community has a well-established convention that the earliest-created entry of a given pair will be taken as the lemma. Also, you should be aware that "-ize" is perfectly good UK English and is, in fact, preferred by the OED. (They in fact point out that "the suffix itself, whatever the element to which it is added, is in its origin the Greek , Latin -izāre; and, as the pronunciation is also with z, there is no reason why in English the special French spelling should be followed, in opposition to that which is at once etymological and phonetic.") So if you are going to play the etymology card, don’t be surprised if it backfires on you. Ƿidsiþ 09:07, 6 May 2011 (UTC)
¶ So could you list any native Latin words which were suffixed with ‐izare? --Pilcrow 09:59, 6 May 2011 (UTC)
Well it is mostly a late-Latin suffix, but there are certainly many examples. , are two that come to mind. Ƿidsiþ 10:07, 6 May 2011 (UTC)
“Organize” is related to the word ὄργανον. You may be correct with “moralizare”, but ‘moral’ was apparently inspired by a Greek word, and the ultimate orgin is not certain. ¶ Regardless, I suppose we need to note spellings (especially since they’re more phonetic) like surprize, advertize, devize, enterprize, compromize or advize to be more ‘correct’? ¶ I don’t care about this discussion anymore, and I am not going to mess with those entries anymore. I wish I did not mention this.--Pilcrow 10:44, 6 May 2011 (UTC)
The rule is we go by usage first, everything else is secondary. --Mglovesfun (talk) 11:19, 6 May 2011 (UTC)
I would like to remind people that while the -ise spellings of these words are largely British (and also used in most countries that use British spellings), the -ize spellings are not uniquely American and should not be marked as if they were. There is a huge difference between spellings like color and center, which are truly American spellings, and spellings like civilize and organize, which are not. The -ize spellings predominate in Canada (which otherwise mostly uses British spellings) as well as in academic writing published in the UK, and anything published by Oxford University Press (which is why it's called Oxford spelling). So marking civilise "British" is fine, but marking civilize "American" is highly misleading. —Angr 21:24, 8 May 2011 (UTC)

I have reverted the changes made to "criminalise" and "criminalize": "criminalize" should remain the main entry, per Ƿidsiþ and Angr. As mentioned by others, "etymologically consistent" is a non-consideration in Wiktionary.--Dan Polansky 12:18, 10 May 2011 (UTC)

I think that "criminalise" is traditionally British and "criminalize" is traditionally American. These tags don't really relate to usage but to traditional usage. I'd happily remove all such tags from -ise/-ize entries as the distinction is inaccurate, perhaps misleading. I've had a long standing argument with Mzajac (talkcontribs) as 'other dictionaries and linguists make the distinction'. I accept this, and have pointed out that paper dictionaries don't categorize, furthermore I've asked him to demonstrate that linguists don't differentiate between American spellings and American English. Favor might be an American spelling, but it's not American English; the word exists in all variations of English. And it seems to be he doesn't have an answer to either of these points. --Mglovesfun (talk) 12:40, 10 May 2011 (UTC)
Oh and I've proposed renaming {{British}} in line with our other regional templates, such as {{Australian}} (adjective) redirects to {{Australia}} (name of the country/region). Ditto for {{American}} or {{Swiss}}. --Mglovesfun (talk) 12:42, 10 May 2011 (UTC)

Quick poll: plurals as a subcategory of noun forms

The discussion above not looking to promising, how about always having [[Category:<langname> noun forms]] as a parent category of [[Category:<langname> plurals]]. This doesn't mean that a category for plurals has to exist, merely that when one does, its parent category should be noun forms. So, Category:English plurals becomes a subcategory of Category:English noun forms even if it is the only subcategory, and the only entry. This is quite helpful for interwikis, such as fr:Catégorie:Formes de noms communs en anglais. Furthermore, entries shouldn't be double categorized; an entry in Category:Catalan plurals shouldn't also be in Category:Catalan noun forms. --Mglovesfun (talk) 10:09, 6 May 2011 (UTC)


  1.   SupportCodeCat 10:30, 6 May 2011 (UTC)


  1.   Oppose In many languages, "plurals" does not always mean "noun". —RuakhTALK 12:01, 6 May 2011 (UTC)
    That's one reason to deprecate Category:Plurals by language. --Mglovesfun (talk) 16:04, 6 May 2011 (UTC)
    I don't follow that logic, Mg. Why not have a Plurals cat, containing whatever POSes it contains (language-dependent)?​—msh210 (talk) 16:33, 6 May 2011 (UTC)
    You're gonna have to explain, sorry. --Mglovesfun (talk) 16:42, 6 May 2011 (UTC)
    In other words, have a category Plurals by language, containing subcategories English plural nouns (or whatever), Hebrew plural forms (with subcategories Hebrew plural noun forms, Hebrew plural adjective forms, and Hebrew dual noun forms, or whatever³), Foo plural verbs (for Foo a language that has no plurals except verbs), etc.​—msh210 (talk) 16:59, 6 May 2011 (UTC)
  2.   as Ruakh.​—msh210 (talk) 16:33, 6 May 2011 (UTC)


  1.   I support an alternative proposal: Category:Plurals of English nouns could be a subcategory of Category:English nouns. --Daniel. 16:00, 6 May 2011 (UTC)

Huge etymologies

I'm just reporting two pages I created a few months ago. The final results are pretty good to me because I like the idea of explaining detailed etymologies while don't having clutter in the main namespace; however, I'm not sure if they should be actually Wikipedia-only material, and if we can trust them to have this sort of information.

In particular, I like to be able to look for when changes in the wording, such as pronunciation and trademark status occurred, but I wouldn't mind deleting certain cultural references such as "'Cowabunga' was reintroduced to the entertainment world via a 1965 Peanuts cartoon in which Snoopy uses the word whilst surfing".

For contrasting comparisons, the etymology of Skidoo is merely "From the trademark, Ski-Doo", the etymology of SPAM is simply "Blend of spiced and ham" (failing to tell that it was invented in 1937, for example) and the etymology of Tetris is rather big but don't displays a number of historical details of the word. They suggest that people might want them as less detailed as possible, or didn't have the chance to detail them. If the former, then both new pages would simply need to be cleaned up to meet standards of conciseness.

That's it. --Daniel. 22:38, 7 May 2011 (UTC)

Why did you think that was a good idea? There's no advantage gained from having etymologies in separate pages, having it in the main entry is sufficient, even if it may be a bit large. -- Prince Kassad 20:14, 8 May 2011 (UTC)
I already replied to your question in the previous message, whereas I didn't say that we should avoid having huge etymologies in the main namespace. However, it's ridiculously easy to move two bits of information, so "where in Wiktionary these pages should be" is really not an issue. In addition, appendices are the standard place to have overly detailed notes that make entries slow to download, so it seemed to be the right thing to do for now. --Daniel. 20:55, 8 May 2011 (UTC)
How about our usual solution, collapsible boxes. Mglovesfun (talk) 20:58, 8 May 2011 (UTC)
I like the suggestion of using collapsible boxes. It doesn't solve the issue of making pages slower to download, but I think there's not any etymology section huge enough to do that yet. --Daniel. 21:02, 8 May 2011 (UTC)
Correct me if I'm wrong, but if the information's in a template (or subpage, or another namespace), it still has to be loaded. So this solution doesn't save any time. Mglovesfun (talk) 22:54, 8 May 2011 (UTC)
Slower pages? Which etymology sections are actually large enough to pose a problem? From what I've seen, even 56k users shouldn't have a problem with any of them. -- Prince Kassad 23:18, 8 May 2011 (UTC)
While I don't like a lot of stuff above the definitions, I don't think this is the way to solve it. IMO etymology should stay on the page.​—msh210 (talk) 15:38, 9 May 2011 (UTC)
Keeping etymologies in the mainspace pages and outside of appendices is what I tend to support. Long etymologies can be put into collapsible boxes, and it is doubtful to what extent they belong to Wiktionary anyway. Page loading delay seems a non-issue, especially when compared to translations. Change tracking dedicated to etymology is an advantage, sure, but that pertains to any Wiktionary section: going along this way would lead to breaking Wiktionary pages down into subpages for the ease of tracking of history per section. --Dan Polansky 12:26, 10 May 2011 (UTC)

Favourite bugs?

As the release of Wikimedia 1.17 approached, the devs made a concerted effort to address a lot of issues including some long-standing bugs, initiatives, and enhancements reported/requested by the sister projects. This rather caught us flat-footed as this is not traditional. I suggested a couple of things, including Conrad Irwin's Transliterator, which has received a partial review.

In looking around for other bugs important to sister projects, I found Wikisource's Wishlist, which seemed like a really good idea and I showed it to Dmcdevit, who created a similar one for en.WT. (Wikisource has a central all-languages project due to its very different project history.)

If we can all add bugs important to this project to this list, and talk about which are of higher priority, it will help us communicate with the developers as to what we would like to see added/created/changed for our project. Since we've gotten some attention, let us do everything we can to make it easy to work with! - Amgine/talk 18:32, 8 May 2011 (UTC)

Hmm, bugzilla:15607 (Interlanguage extension) would be really useful here, but they seem to be focusing on Wikipedia use, especially with the design. bugzilla:13228 ("Adding custom inter-namespace tabs") would be helpful for citations tabs... --Yair rand 20:37, 8 May 2011 (UTC)
Add the bugs to the Wiktionary wishlist. That is for our priorities, and the bugmeister is aware of the page. - Amgine/talk 16:10, 10 May 2011 (UTC)

making definitions findable

(Of course this has come up before.) There are so many feeders-back who can't find our definitions. IMO we must make them more findable. How shall we do so? Propositions that have been mentioned before include those listed under the following headers; please add headers for propositions I've forgotten (or missed, or new ones). Please indicate, under each proposition you like (or are willing to accept if the ones you like are not liked enough), that that's the case. Thanks!​—msh210 (talk) 07:06, 9 May 2011 (UTC)

  • I really don't like the simple yes/no format of this poll. In my opinion, the definition lines should be much nearer to the top, with the etymology at the bottom, the pronunciation section reformatted and right-floated, with only the part of speech header and headword line (which should really be merged into one line) above the definitions, along with tabbed languages pushing the whole thing much nearer to the top. A slight change in the font size of definitions might also be helpful. (Perhaps there should be a "Discussion" heading at the bottom of this section?) --Yair rand 07:45, 9 May 2011 (UTC)
  • Re: "I really don't like the simple yes/no format of this poll": I agree. It's been a while since we last discussed this, so I think we should start by gathering ideas. —RuakhTALK 11:02, 9 May 2011 (UTC)
  • Actually, taking that a bit further — I think we should really start by figuring out what the problem is. For example, if the problem is that people are looking at the wrong page — one feedbacker went to [[Bark]] instead of [[bark]] — then none of the below suggestions will help even a bit. —RuakhTALK 14:59, 9 May 2011 (UTC)
  • Well, that's a problem also, but I think it's abundantly clear that a common problem is that people can't find the definitions on the correct pages.​—msh210 (talk) 15:09, 9 May 2011 (UTC)
  • Could you give some links to specific feedbackers who have made that clear? What else have they said? (In particular, did they give the impression of being a small minority of users, the total idiots who are beyond all help?) —RuakhTALK 16:00, 9 May 2011 (UTC)
  • I don't know which class of users leave feedback, and, among those, which class leave this sort of feedback. Feedback entries that indicate a user couldn't find definitions include the following (just from the Feedback page as it is now), some more clearly so than others: Feedback on sora (you'll have to look for it; many of the sections on the page have the same header); possibly the feedback including the word thisa; likely the feedback on svart; likely the feedback including the phrase take the definition; likely the feedback on adz; the feedback including the word helpfull; likely the feedback on francophone; likely the feedback containing the phrase submit the definitions; the feedback on turtle; the feedback on rolf; and possibly any of the feedback entries that say just "unhelpful" or the like. There are 395 L2 sections at WT:FEED; I've listed ten (not counting the "unhelpful" ones). That's 2.5%, which, if it's representative of users as a whole (which I have no reason to think it is, but who knows whether it's more or less), is very, very high.​—msh210 (talk) 16:48, 9 May 2011 (UTC)
  • So, these: sora · thisa · svart · take the definition · adz · helpfull · francophone · submit the definitions · turtle · rolf. Honestly, I'm not convinced that those represent people who went to the right page and couldn't find the definitions. Quite a few were coming from Special:Search, which makes it seem like they didn't find the page they were trying to. The one who complained about "adz" presumably couldn't find the definition because it's at [[adze]]; none of the below suggestions would help very much with that. Similarly for "francophone": the definition's at [[Francophone]]. (In both of those cases, clearly labeling our non-definitions as "Definitions" might have made things more clear, but I wouldn't bet money that it's enough.) And so on. —RuakhTALK 17:09, 9 May 2011 (UTC)
  • Discussion is ongoing in the proposition sections. I listed this here and encouraged people to add more options (rather than listing it at WT:V, say) in order to generate discussion as well. If you feel a Discussion section is appropriate, add it, of course; IMO the existing sections (including this lede) are good for discussion.​—msh210 (talk) 14:53, 9 May 2011 (UTC)
  • Yair, how do you visualize the headword and POS being on one line? I mean, how would it be worded? Would it be a header, or just a (boldface, say) line? Etc.​—msh210 (talk) 15:09, 9 May 2011 (UTC)
Status quo
  1. From day one, I have never had any problem finding the definition. Basically, I just look at the screen. SemperBlotto 07:06, 10 May 2011 (UTC)
  2. This is okay. Maybe it can be improved, but the implementation details are critical. As Ruakh has pointed out, it is unclear how many of the feedbackers really have a problem with the current heading structure rather than with something else. --Dan Polansky 11:05, 10 May 2011 (UTC)
  3. I think it's fine the way it is - sorting by etymology makes more sense linguistically, I don't think finding the definitions is that hard. ---> Tooironic 00:42, 11 May 2011 (UTC)
Setting the definitions' background to a different color
  1. I'm willing to accept this as better than the status quo, if my preferences aren't shared by the community.​—msh210 (talk) 07:06, 9 May 2011 (UTC)
Setting a border around the definitions
  1. I'm willing to accept this as better than the status quo, if my preferences aren't shared by the community.​—msh210 (talk) 07:06, 9 May 2011 (UTC)
  2. This is a new idea to me, but it might work, and it wouldn't interfere with the current format. What would it look like? What is the newcomer's problem really - not finding the definition during their first visit in the site? Colors would not help, since how would he know which color to follow - why not the red or blue links? I find the en.wiktionary format restful to the eye, and I'd hate spoil it with colors, like some wiktionaries do. --Makaokalani 15:42, 9 May 2011 (UTC)
  3. This is what User:Internoob/colour.js does. :) —Internoob (DiscCont) 00:44, 11 May 2011 (UTC)
Setting the definitions' text to a different color
  1. I'm willing to accept this as better than the status quo, if my preferences aren't shared by the community.​—msh210 (talk) 07:06, 9 May 2011 (UTC)
Setting the definitions' font size larger
  1. I'm willing to accept this as better than the status quo, if my preferences aren't shared by the community.​—msh210 (talk) 14:56, 9 May 2011 (UTC)
  2. This could have some charm, but what happens with quotations and example sentences? What are the implementation details, like, are we going to place each definition in a template? --Dan Polansky 11:05, 10 May 2011 (UTC)
    Implementing this would be simply a matter of editing Common.css. Quotations and example sentences could be unaffected. --Yair rand 11:52, 10 May 2011 (UTC)
    Makes sense; thanks. This makes this option rather attractive, as it can be implemented soon without any changes to the wiki markup and sectioning structure. If you would feel like posting the CSS required to do this, I and other people could try this for themselves by editing their monobook.css, vector.css or whatever they have. --Dan Polansky 12:02, 10 May 2011 (UTC)
    ol li{font-size:16px}ol li dl dd, ol li ul li{font-size:13px} (replacing 16px with whatever size). --Yair rand 12:14, 10 May 2011 (UTC)
    Thanks. After trying it out, I am no so sure larger font is a good idea. But it may be a problem of getting used to it. --Dan Polansky 12:33, 10 May 2011 (UTC)
Adding an L3 "Definitions" header
  1. I prefer this.​—msh210 (talk) 07:06, 9 May 2011 (UTC)
  2. This could have some charm, but it makes the table of contents even longer. If the table of contents were so simplified that it only showed language names, the "but" would no longer matter. --Dan Polansky 11:05, 10 May 2011 (UTC)
Putting definitions before any other info within each language section
  1. I prefer this.​—msh210 (talk) 07:06, 9 May 2011 (UTC)
  2. I prefer this [interpreted along the lines MG and msh describe]. DCDuring TALK 11:36, 9 May 2011 (UTC)
  3. I prefer this, though not for myself, but because users are asking for it. Mglovesfun (talk) 11:42, 9 May 2011 (UTC)
    • Question: (not specifically to Mglovesfun) How would this be done? The definitions would be placed before all other info, including part of speech and inflection? Would this be true for all definitions, so that no other information can be displayed except under every definition from every part of speech? --Yair rand 11:49, 9 May 2011 (UTC)
      • For one possibility, see he:פסח. Information such as part of speech and inflection appears in side-boxes; etymologies and related terms and so on appear below definitions; not all definitions come first — the page is split by word (the noun pésakh has a section, the verb pasákh has a section, the adjective-cum-noun piséakh has a section) — but since the definitions for the first word appear very high up on the page, it arguably makes the structure as a whole more obvious, and it's clearer where to find definitions for each word. (That said, I don't understand how anyone could possibly find our link to leave feedback but not be able to find our definitions, so it's hard to speculate about what other system would work better!) —RuakhTALK 13:57, 9 May 2011 (UTC)
    I think before all other info doesn't mean literally before any other character or string of characters. But rather having definitions sections, including their headers, like ===Noun===, ===Verb=== before any other section. Thus, etymology, alternative forms an pronunciation are included after the definitions. Stephen G. Brown on WT:FEED raises what I feel is a legitimate point over what to do when there aer multiple etymologies. --Mglovesfun (talk) 13:33, 9 May 2011 (UTC)
    Yeah, that was my intent also (having definitions before any other section, not before any other info). I don't know where SGB's comment on multiple etymologies is, but they are definitely a concern; but see Ruakh's comment, just above, "since the definitions for the first word appear...".​—msh210 (talk) 14:53, 9 May 2011 (UTC)
    And one way to handle etymologies is at user:msh210/ELE.​—msh210 (talk) 14:58, 9 May 2011 (UTC)
  4. Yes. But this definitely needs a VOTE. -- Prince Kassad 12:09, 9 May 2011 (UTC)
  5. Support. The ToC could also be reduced like they did on fr.wikt. —Internoob (DiscCont) 22:24, 9 May 2011 (UTC)
  6. This could have some charm, but the implementation details are critical: how do you deal with multiple etymologies and multiple pronuciations? Should all definitions be listed next to each other or should they still be distributed between part-of-speech sections? If the definitions are distributed, only the definitions of the first etymology come before everything else. Later: I see, the implementation details are exemplified here: User:Msh210/ELE/Example, or in this revision in particular. I am not sure I like the particular exemplification. The first thing I would do to make definitions easier to spot is remove unsuitable boldface: from the year of a quotation, from the quoted headword, and from the headword in an example sentence. I have removed boldface from these cases in my custom CSS in monobook.css, but here, as the CSS does not apply, I see again how distracting the boldface really is. --Dan Polansky 11:20, 10 May 2011 (UTC)
    Well, user:msh210/ELE/Example is just one possible way to put the definitions first, and is, I assume, not what's on the minds of those commenting here (besides myself).​—msh210 (talk) 15:55, 10 May 2011 (UTC)
    I think it would be good to look at what possible use cases there are. Most people will come here wanting to know what a word means based on its spelling, so they will want to see the definitions first. But we can also cater to other possibilities, maybe by using JavaScript to re-sort the headings according to the order the user wishes. That way, if the user wants them sorted by etymology, they can. —CodeCat 16:05, 10 May 2011 (UTC)


Have a bot running around {{sandbox|cleaning}} Project:Sandbox every twelve hours. It doesn't get much attention anyway. How 'bout it? TeleComNasSprVen 10:44, 10 May 2011 (UTC)

I don't think it's necessary, but I don't see the harm. More useful IMO would be something like w:user:SineBot (signs unsigned comments on specific discussion pages) and a bot that goes through a dump checking ISBNs and ISSNs for correct checksums and tagging entries with problems in that regard.​—msh210 (talk) 15:48, 10 May 2011 (UTC)

problem with "Add translation"

I can't see the link "Add translation" below the translations. I usually use Opera. If i use Firefox i see the function and i can add translations easily. Can anyone confirm this problem? --Yoursmile 17:25, 10 May 2011 (UTC)

Redundant lexicons

In the process of creating categories to implement the results of Wiktionary:Votes/2011-04/Lexical categories, I made introductions that try to explain individual labels such as rare, politically correct, etc.

Category:English lexicons lists them all.

Feel free to double check and improve them. In particular, the pairs "archaic"/"obsolete" and "pejorative"/"offensive" are very redundant to me, so they got redundant descriptions. Reading old discussions and introductions of old categories unfortunately didn't help much. --Daniel. 04:09, 11 May 2011 (UTC)

Archaic is what you use to sound old; obsolete is what nobody uses, and wouldn't be understood if they did.--Prosfilaes 05:13, 11 May 2011 (UTC)
Isn't it dated if you sound old? I thought archaic meant that it wasn't just dated but also no longer in common use, even though people still understand it. And obsolete is if people don't understand it either. —CodeCat 10:51, 11 May 2011 (UTC)

Template:plural of

Currently, {{plural of}} automatically defaults to "plural form of <lemma>" unless the cap or dot parameters are specified. However, according to Wiktionary:ELE#Definitions, "[e]ach definition may be treated as a sentence: beginning with a capital letter and ending with a full stop", and I think that we should change this template to reflect this; i.e. "Plural form of <lemma>." should be the default instead. TeleComNasSprVen 06:16, 11 May 2011 (UTC)

I support "Plural form of <lemma>." being the default. --Daniel. 06:18, 11 May 2011 (UTC)
More substantially WT:GP#Form of templates. I still think the final dot shouldn't be used; this may be added by hand, but the template shouldn't make it obligatory. As you've pointed out, it does say "may be treated as a sentence"; there was a never enacted voted created by Visvisa to change this to "should be treated as a sentence". --Mglovesfun (talk) 11:36, 11 May 2011 (UTC)

Seeing flood-flag contributions

If someone uses the flood flag to make contributions, how can people see these contributions in recent changes? I would expect some conspicuous button such as "Show/Hide flood flag contribs", but I couldn't find anything. Apparently it is impossible to see the contributions in question there, which would be a bad thing. AFAIK, bot edits are easier to see. --Daniel. 08:32, 11 May 2011 (UTC)

Flood flag edits count as bot edits. You can see them in RC by clicking "Show bots". --Yair rand 08:36, 11 May 2011 (UTC)
OK; much better. Thank you! --Daniel. 08:39, 11 May 2011 (UTC)

Distinguishing between dated, archaic, obsolete terms and senses

Currently we have categories for terms that are dated, archaic or obsolete. But these categories make no distinction between cases where the term as a whole is old, and cases where the term itself is still in use, but it is one or more specific senses that is old. I think this distinction is useful enough that we should categorise accordingly... what do you think? —CodeCat 10:56, 11 May 2011 (UTC)

If I'm understanding you correctly, ideally yes but how? The first term in Category:Obsolete I notice is barrow, which has an obsolete meaning but isn't itself obsolete. OK this is a bit misleading, but it would be impractical, though not impossible, to distinguish between English obsolete terms and English terms with obsolete meanings. You'd need a parameter in {{obsolete}} such as {{obsolete|defn=1}}, where 'defn' categorizes in Category:English terms with obsolete meanings and when it is not present, in Category:English obsolete terms. --Mglovesfun (talk) 11:07, 11 May 2011 (UTC)
How useful is that distinction? How would you use it? How does it improve Wiktionary? I fail to see reasons to support this proposal, so people can help me by replying to some of these simple questions. --Daniel. 11:11, 11 May 2011 (UTC)
When I look a category like archaic, I often wonder what exactly is archaic about the word since to me it's usually just a common word. Only in a few cases is the word itself actually archaic. It may not be useful as such but the current situation is confusing. —CodeCat 11:14, 11 May 2011 (UTC)
Well, this situation can occur with many dichotomies, not only archaic/nonarchaic. When you see a member of "Category:English nouns", be prepared for the possibility of seeing English verbs defined there too; and maybe one or two French nouns. The categories label portions of entries rather than full entries. In fact, when an entry displays an archaic together with a nonarchaic sense, you can say the entry simply displays two words that are homographs with different usages. --Daniel. 11:24, 11 May 2011 (UTC)
Put in the category description "archaic terms or terms with archaic meanings". --Mglovesfun (talk) 11:30, 11 May 2011 (UTC)
The context tags archaic and obsolete would probably benefit from some effort to improve their application. That would suggest that we need to have some discussion of actual criteria for their application. In addition, we need to make their use consistent with {{dated}} and Widsith's {{defdate}}. The last thing we need however is to lose information in a misguided effort to achieve some kind of top-down uniformity reform-by-categorization, letting the tail wag the dog. DCDuring TALK 11:52, 11 May 2011 (UTC)

Substrate derivations

Currently we have no way of categorising words that are thought to originate from an unknown substrate language. I think this would be useful especially for the Germanic languages, which are thought to have inherited many words from an unknown language spoken in what is now Germanic territory. —CodeCat 11:00, 11 May 2011 (UTC)

I created {{etyl:substrate}} and Category:ht:Substrate derivations for this exact purpose in Haitian Creole. —Internoob (DiscCont) 23:08, 11 May 2011 (UTC)

Alternative spellings and forms again

I think that, along the lines of Wiktionary:Votes/pl-2010-07/Alternative forms header, we should merge the categories and templates for alternative forms and alternative spellings. People have confused the two pretty severely it seems and it's not worth it to have one header for each but two categories. One example of this is {{alternative capitalization of}} gives the category for alternative forms rather than the expected one for spellings. —Internoob (DiscCont) 22:59, 11 May 2011 (UTC)

In my opinion, the basic idea of having separate categories for spellings and forms isn't very bad if used with particular care; but the proposal of merging them is better especially if it makes it easier to categorize entries and navigate them. Support. --Daniel. 23:13, 11 May 2011 (UTC)
I support this, in fact I deleted Category:Old French alternative spellings awhile ago and moved everything into alternative forms; furthermore as Ruakh pointed out, these aren't like archaic or dated forms; {{alternative form of}} is really only used to avoid repeating a definition or definitions, it's not a 'context' label like {{obsolete spelling of}}, so I think all the alternative form categories should be hidden categories. Mglovesfun (talk) 23:21, 11 May 2011 (UTC)
OK. It seems uncontroversial, so: done. The template magic helped me to get rid of categories of alternative spellings, archaic spellings, etc. and replace them by alternative forms, archaic forms, etc. except when they weren't templatized in the first place. The descriptions of categories for forms also became more generic, which is a good thing. --Daniel. 23:54, 11 May 2011 (UTC)

Low German

I know this has been brought up once, but neither was it solved not sufficiently explained. When using the nds language code, the language given is Low Saxon. As a citizen of Germany I must mention that both the European Union and the German government refer to the language as Low German. Further I find the wiktionary definition of Low Saxon doubtful, if maybe referenced. It seems to contradict the tenor of most (if not all) of the Wikipedia articles on both Low Saxon and Low German. At least it has little to no connection to everyday use and (more importantly) none to the Eigenbezeichnung (it means what-one-calls-himself, I don't know the word, sorry) of the Low Germans. As a speaker of Low German I must stress intensely, that the use of Low Saxon for Low German is highly offensive for non Low Saxon-speaking Low Germans. The Low German community itself is most strictly differentiating between Low German and Low Saxon, with every Low Saxon naming his language as "Low German". Imagine referring to English only as "American". In Holland the language is indeed called Nedersaksisch (Low Saxon), but this is mostly (if not only) from non-speaking official side, with the speakers calling it Nederduits/Nedderdütsch (Low German) and is based on the fact, that there are only Low Saxon dialects spoken in Holland. The "nedersaksisch" Wiki on the other hand understands itself as a dialectal wiki, rather than a language wiki, because of the different base of orthography (Dutch writing instead of German/Midde Low German). As for avoiding "Modern" tags in languages....Neither English, nor German, nor Dutch are called Modern/New English, German, Dutch, despite there being Middle English, Dutch, (High) German... So I starkly request that the templates are changed to Low German - or Saxon, which would be highly accurate regarding history and wholly unintelligible to probably everybody.

Again, now matter what the ominous (since I haven't found the reference) linguistics said, the institutes at the Universities are called "Niederdeutsch" (Low German), the speakers call their language only this and nothing else, the European Union calls it Low German and to any non-Low Saxon the current state could almost be an insult. (Maybe I'm to strong, but at least people receive it as plain wrong.) Hence I request to change the nds-templates FROM Low Saxon TO Low German. I also ask you, to inform me, if this belongs to the grease pit.

And I hope, that stark has no rude sound in English. Dakhart 00:03, 12 May 2011 (UTC)

After looking at the old (2010-)topic again, I might add:

1.It shows that there is the need to clearly sunder both Low German and Low Saxon.

2. If there is actually the need to have a tag to refer "the whole history of Low German", then please make one up as for ANY other language on this Wiki. Correctness over simplicity.Dakhart 00:11, 12 May 2011 (UTC)

Done (by Stephen G. Brown). MglovesfunBot is as we speak converting uses of Low Saxon to Low German in translations and descendants sections. --Mglovesfun (talk) 16:25, 25 May 2011 (UTC)

Poll: External linking

Currenlty, several entries in Wiktionary still use the external links header, see Special:Search/external links. However, in Wikipedia, it is recommended to add links leading outside the project to be formatted into the "References" header rather than "External links" because it would have more useful encyclopedic content; and indeed throughout much of the wider Wikimedia comunnity "External links" is discouraged or does not have wide support. Within Wiktionary itself, there are often three common types of external links: valid citations or quotations, which belong either in the Citation namespace adjacent to an entry or otherwise located directly underneath the definition itself or in the "References" header; interproject links, such as those to Wikipedia, which belong at the top of a language entry or in a "See also" section; or spam links to personal websites, which should be promptly removed from Wiktionary entirely. TeleComNasSprVen 01:57, 12 May 2011 (UTC)

Some related reading material: Wiktionary:Beer_parlour_archive/2007/November#Inter-project_links:_"See_also",_or_"External_links"?, Wiktionary:Beer_parlour_archive/2007/September#External_links, Wiktionary:Beer parlour#External links, Wiktionary:Links

I support the use of "External links"

  1.   Support Daniel. 05:41, 12 May 2011 (UTC) The reasons provided by the pollmaster (I invented or reinvented this word now.) mainly sounded to me as "let's just imitate Wikipedia!" I'd disagree about that; we don't need to imitate Wikipedia. Well, using "External links" to list external links is a good idea, because... I, personally don't want to use the imperative tense "See also!" to direct users of a dictionary to an encyclopedia. This project is good enough alone for its scope, and people are welcome to see other projects if they want to enjoy different scopes for additional information anyway. --Daniel. 05:41, 12 May 2011 (UTC)
  2. Like Daniel., I think "External links" is the best place for WP links.​—msh210 (talk) 07:39, 12 May 2011 (UTC)

I oppose the use of "External links"

  1.   Oppose I don't view links to sister projects as "external". They fit well in "See also", often better than the loose assortment of miscellaneous entries that clutter "See also". References seems to encompass dictionary links. Typically that leaves External links for spam. DCDuring TALK 10:36, 12 May 2011 (UTC)

I use both "External links" and "See also"

  1. Well, I guess: "See also" is good for internal links that don't belong anywhere else in the entry but do belong in the entry. But I don't remember the last time I've added a link to an entry, putting it under "See also".​—msh210 (talk) 07:39, 12 May 2011 (UTC)

I prefer not to say (abstain)


More generally, I only use 'external links' when there's already a lot in the ===See also=== section, so I don't want external links to be disallowed, but I don't see a need for an external links section for every external link if just putting it at the bottom of see also is just as good. --Mglovesfun (talk) 10:48, 12 May 2011 (UTC)

But ===References=== would work too instead of external links. --Mglovesfun (talk) 10:55, 12 May 2011 (UTC)

Wiktionary:Neutral point of view

I think that the policy template ought to be removed from this particular page. In the first place, it's short and too inaccurate, and almost never mentioned throughout Wiktionary discussion, and secondly it was only added by the main creator of the page Vildricianus (talkcontribs) without any previous votes designating that this page should be policy. TeleComNasSprVen 02:40, 13 May 2011 (UTC)

Well, Vild merely tagged it with {{Policy-SO}} (for "semi-official policy"); it was Connel MacKenzie (talkcontribs) who made it supposed "policy", when he redirected all the policy-like templates to {{policy}}. Some of those policy-like templates, such as {{Policy-TT}}, have been restored to their former meaning; I wonder if a good first step might be to do the same with {{Policy-SO}}. (But that would have the consequence of making WT:CFI no longer "policy", either, so it's not something to be bold with.) —RuakhTALK 03:03, 13 May 2011 (UTC)

Treatment of Proto-Norse

Proto-Norse, or Proto-North Germanic, is the common ancestor of the North Germanic languages. Despite its name, is actually attested in several inscriptions, but is otherwise largely unknown. The problem is mostly that it is largely identical with its ancestor, Proto-Germanic, and is often seen as simply the northern dialect of it since the differences are very minor. This means that there is a potential for a huge duplication of effort, since a lot of Proto-Norse words will have Proto-Germanic equivalent that is exactly identical with the Proto-Norse term, and there is no consistent way to differentiate between them. For this reason I would prefer if we do not allow Proto-Norse at all, or if we do, we should treat it as dialectal Proto-Germanic, not a language in its own right. —CodeCat 11:06, 15 May 2011 (UTC)

Hmm, I thought proto languages by definition are unattested, so when it's attested wouldn't it be Early Old Norse? Also we have {{etyl:gmq}}; you can use {{recons|foo|lang=gmq}} to link to Appendix:North Germanic/*foo if you like. I realize this doesn't really answer the question but hopefully it provides some useful information anyway. --Mglovesfun (talk) 09:55, 17 May 2011 (UTC)
Not all proto-languages are unattested. Proto-Romance, for example, is a Latin dialect. My point in this case is more that we shouldn't treat Proto-Norse as a separate language anymore than we treat Late Latin as a language distinct from Latin proper. Proto-Norse is closer to Proto-Germanic than it is to Old Norse. —CodeCat 10:35, 17 May 2011 (UTC)
Banning Proto-Norse would make this wiktionary a bit incompatible to the outside world, because Proto-Norse does exist there. From a Proto-Germanic point of view, Proto-Norse might be just a dialect. From Old Norse point of view, however, Proto-Norse is relevant, much more relevant than Proto-Germanic. Old Norse grammars often present Proto Norse forms, not Proto Germanic ones, in order to explain Old Norse phenomena such as breaking and mutation. --MaEr 18:23, 18 May 2011 (UTC)
In this case the duck test probably helps though. If the forms they cite are called Proto-Norse but are really Proto-Germanic, should we treat them differently? I am not so much against Proto-Norse as much as I am wary of duplicating a lot of effort for very little gain. The majority of Proto-Norse reconstructions will be identical to Proto-Germanic ones, but if we split our efforts then we will have less in the end than if we combined our efforts. —CodeCat 19:17, 18 May 2011 (UTC)
There are minor differences between Proto-Norse and Proto-Germanic, so these forms won't be identical. But this is not the point. It's a good idea to turn Proto-Norse words to Proto-Germanic ones if the Proto-Germanic word has a dictionary entry. The Proto-Norse word won't have an entry, I guess, and I am not planning to write any.
But not every Proto-Norse word can be transformed into a Proto-Germanic one. If a word is well attested in the North Germanic languages and completely missing outside, we cannot simply assume that it is Proto-Germanic, too, even if we change the Proto-Norse form into a Proto-Germanic one.
Another reason, especially for myself, is the following: if I add the etymology of an Old Norse word my source might mention the Proto-Norse ancestor but not the Proto-Germanic one — if there is any, see above. In this case, I prefer adding the Proto-Norse ancestor only. The alternatives would be: guessing the Proto-Germanic one, omitting the ancestor (because Proto-Norse is "banned"), disguising the Proto-Norse word as Proto-Germanic (which would be inaccurate), ...
--MaEr 12:59, 21 May 2011 (UTC)
post scriptum: I have been writing quite slowly and then there was an edit conflict. My comments above don't reflect CodeCat's short comment below, yet. -- MaEr
By your reasoning, we should treat words that appear only in 'American' and not in 'English' as a separate language as well. Because the situation is analogous to that, they are words that appear in one dialect area of a continuum of mutually intelligible languages. I would even say the differences (especially in speech) between American and English are greater than those between Proto-Norse and Proto-Germanic.
If you are afraid that you'll accidentally add a later Norse form of a word, it should be easy to devise a kind of scheme to undo the changes. In most cases you won't even need to, but for example PN neuter -a is written in PG entry names as -an and in reference as -ą (the difference is similar to how we treat macrons in Latin). An example of how dialectal forms may be treated is Template:termx, which occurs only in West Germanic. —CodeCat 14:20, 21 May 2011 (UTC)
The ajjaz example is interesting but there seems to be a misunderstanding: I'm not planning to create Proto Norse entries. I just mention them in the etymology sections of Old Norse entries.
Indeed, in my understanding, Proto-Germanic is the same as Common Germanic. If, however, a majority favours a different definition, I'll agree. For me, a definition is a tool, not a dogma. --MaEr 08:17, 22 May 2011 (UTC)

I just realised something else. Since Proto-Norse is an attested language, it qualifies under our CFI to be added to the main namespace. However, it currently has no language code. Is {{non-pro}} or {{non-proto}} ok? —CodeCat 12:43, 21 May 2011 (UTC)

When Proto-Norse is a subset of Proto-Germanic, and Proto-Norse is attested: does this mean that Proto-Germanic is an attested language, too? Will the Proto-Germanic entries be moved from the appendix into the main namespace? Just wondering... --MaEr 08:17, 22 May 2011 (UTC)
We can't move the entries because they are unattested themselves. Only attested terms can go in mainspace. However a word such as (or maybe its runic equivalent?) could go in mainspace if we had a language code for it, because it is attested on the golden horns of Gallehus.
I agree we can't treat them differently and the same at the same time. So what I suggest is that a few people with knowledge of Proto-Germanic regularly check Index:Proto-Norse, and update the entries to point to Proto-Germanic if they can. That way, people can still add PN etymologies but they will eventually point to the right place if possible. —CodeCat 09:51, 22 May 2011 (UTC)
I think we should by all means avoid adding Proto-Norse to the mainspace. The (very small) corpus of runic inscriptions should be added to a separate appendix with an alphabetical index, and the attested forms can be mentioned at the Proto-Germanic (Proto-Norse) entry (which is also in an appendix, as is the norm). These are primarily reconstructed languages, with only fragments attested, and not comparable to e.g. Gothic, which has a Bible translation with tons of prose that is intact. As for the entries, perhaps we don't need separate entries for the Proto-Norse forms generally, although it might be nice to see at least some room for separate inflection paradigms, etc. – Krun 09:53, 27 May 2011 (UTC)
We do have Category:Phrygian language, which I believe is even less attested than Proto-Norse. And as for inflections and reconstructions, the problem is that Proto-Norse is not the same as Proto-North Germanic. Proto-North Germanic would be the last common ancestor of the North Germanic languages. But if we don't count Proto-Norse, that last common ancestor would actually be after the first attestation of Proto-Norse, since the last common ancestor was a late form of Proto-Norse that began to diversify into dialects. The language of the Gallehus horns was a common ancestor, but it was not the last common ancestor. This would not be much of a problem, if it weren't for the fact that reconstruction using the comparative method always yields the last common ancestor. So, any reconstructed terms would actually be late Proto-Norse, and if they're not then they must have been reconstructed with East and West Germanic evidence as well, which means that it's already Proto-Germanic. —CodeCat 10:14, 27 May 2011 (UTC)


Do people use a POS header "Punctuation"? (Or "Punctuation mark", perhaps?) This entry (among many others) could benefit from that idea, since its current header is "Symbol", which is too generic. --Daniel. 21:13, 15 May 2011 (UTC)

Well, since no one replied or objected, I'm going to add a small number of "Punctuation mark" headers, like 15 or less per week in the next weeks. Feel free to comment about this decision; I'll keep a personal list of the affected entries in case people have better ideas in the future (such as reverting to the "Symbol" header, or using another header). --Daniel. 12:24, 19 May 2011 (UTC)
I don't see that it's better than "Symbol", but don't see that it's worse (except where a symbol is both punctuation and something else). No objection from me (for symbols that are only punctuation). I think more editors' input would be nice before it's fiddled with it, but no one seems to be saying anything (this my comment is a little delayed, too).​—msh210 (talk) 16:37, 19 May 2011 (UTC)
Four days is nowhere close to enough exposure for a never-before-discussed change. We have numerous changes at various early stages of implementation. They should probably be brought closer to conclusion before we embark on the pursuit of more bright, shiny objects, especially those that seem only to have the strong support of one and the qualified support of one other. I see more value in sub-categorizing Symbols as a necessary precursor to an intelligent discussion and consensus than in precipitous action. We have plenty of detritus from earlier projects, such as the "Shorthand" L3/4 header. We also have more important potential changes than finer breakdowns of headers, IMHO. DCDuring TALK 17:23, 19 May 2011 (UTC)
Punctuation marks are already categorized into Category:Translingual punctuation, Category:Japanese punctuation, etc. Are they a sufficient precursor for an intelligent discussion to you? By the way, I might suggest a move to Category:Japanese punctuation marks and Category:Translingual punctuation marks in the future.
DCDuring, if you want to discuss potential changes that you consider more important than this, be my guest. --Daniel. 17:33, 19 May 2011 (UTC)
I have a few specific questions the answers to which might help contributors who care to support the proposed change:
  1. Which inflection templates are now used under the Symbol header? For Translingual? For English? For other languages?
  2. Are there really mutually exclusive and collectively exhaustive subcategories for Symbol?
  3. Specifically, aren't symbols used as punctuation marks used for other purposes as well? (If so, we are adding headers, not replacing them, thereby adding to the length and possibly impairing navigability of such entries.)
  4. Do punctuation practices differ enough by language to warrant separate L2 sections for such Symbols?
  5. Should we have a rebuttable presumption of only Translingual punctuation sections, supplemented by Punctuation appendices for each language?
DCDuring TALK 17:54, 19 May 2011 (UTC)
1) Mainly {{infl}} or none; the latter often leads to no categorization of symbols at all. There are a number of different situations: Notably, the context template {{element symbol}} populates Category:Symbols for chemical elements with a number of Translingual entries that are not members of Category:English symbols. We have a template {{en-symbol}} which is virtually unused. I quietly created it. has a symbol header, above a Template:ko-hangul-symbol which categorizes the entry into the odd and nonexistent Category:Hangul. 2) They're not exhaustive. Our categorization of symbols is messy and inconsistent as a whole: Category:Punctuation by language is, however, coherent and navigable enough since I cleaned it up with help of other people, despite the lack of coverage of senses to be categorized; I could say the same about categories of letters. 3) There are certainly symbols that fit multiple POS headers. In particular, I added a "Conjunction" section to / because one sense is indeed a conjunction. 4) What? 5) There are non-Translingual senses of punctuation marks: Category:Japanese punctuation contains many Japanese-only or CJK-only symbols; @ has a number of definitions of multiple languages, some of them just tell the name of the symbol, others are more unique; and the only current definition of ¿ is a punctuation mark in Spanish. (However, the definition only mentions the shape, without explaining how it is used as a punctuation mark. It would be like defining "!" as a dot and a vertical line. The categorization and the usage notes, however, do explain shortly how the symbol is used.) --Daniel. 20:22, 19 May 2011 (UTC)

I tested the new header here as I mentioned, and it looks good. DCDuring, if you have more questions, feel free to ask; this invitation applies to other people as well. If no one objects, I'm going to ask Kassad to make his bot recognize "Punctuation mark" as a good header, like all other headers that we use. --Daniel. 13:58, 20 May 2011 (UTC)

Categories: "Simplified Chinese slang", "Simplified Chinese colloquial terms" etc

Uh... no. These should be "Mandarin slang in simplified script", "Mandarin colloquial terms in simplified script", etc. "Simplified Chinese" is not a language. (See 可咋整呀, 非死不可, etc.) Please rename these, it looks silly. ---> Tooironic 05:10, 16 May 2011 (UTC)

Is there any real reason to categorize these by script at all? We don't categorize other languages by script. What's wrong with just "Mandarin slang"? --Yair rand 06:22, 16 May 2011 (UTC)
Wiktionary has a long tradition of separating words between these two writing systems and not always being sure what to do with them. In particular, Category:Traditional Chinese language existed and got deleted, but we still have and use the [pseudo-]language codes {{zh-cn}} (Simplified Chinese) and {{zh-tw}} (Traditional Chinese).
However, categorizing words by their languages and scripts is not per se a bad thing, particularly now that we have relatively mature language categories, script categories and "hybrid categories of senses per POS per language per script". The system of Category:Mandarin nouns in simplified script is not perfect, but it is good and I like it. There are times when I, personally, would want to study one set of characters or another, particularly in Japanese. I only miss a Category:Mandarin nouns in pynyin or, even better, Category:Mandarin nouns in Latin script. --Daniel. 09:50, 16 May 2011 (UTC)

I do not understand how a term can be slang, colloquial, obsolete... based on its script. These categories make no sense and should be deleted. -- Prince Kassad 09:52, 16 May 2011 (UTC)

Can someone fix this soon? I just created 晨勃 using the usual templates and now the categories are all out of whack. Ugh! Who changed this? I don't remember there ever being a vote. I'm the one who created most of the entries in these categories, you'd think whoever implemented this change would have consulted me first. >.< ---> Tooironic 08:12, 17 May 2011 (UTC)

The vote was Wiktionary:Votes/2011-04/Lexical categories, moving lexicon categories using language code prefixes to use full names. Mandarin entries were categorized into the old categories by using context templates with the language being set to zh-cn or zh-tw, so when the templates were changed from categorizing into (code):Slang to categorizing into "(full language name of code given) slang", {{zh-cn}} and {{zh-tw}} expanded into "Simplified Chinese" and "Traditional Chinese", and categorized the entries like that automatically. --Yair rand 08:25, 17 May 2011 (UTC)
The phenomenon described by Yair above is rather a bug than something planned and endorsed by a vote.
The correct name, by comparison with many other Sinitic categories, would be (as mentioned before) "Category:Mandarin slang in simplified script", not "Category:Simplified Chinese slang". --Daniel. 08:33, 17 May 2011 (UTC)
Can I just agree with Tooironic and Prince Kassad. Simplified/Traditional are orthographic systems; we don't have Category:Cyrillic Serbo-Croatian slang, for example. Mglovesfun (talk) 09:45, 17 May 2011 (UTC)
In fact, Tooironic and Prince Kassad expressed completely different opinions. So, no, Martin, you can't [completely] agree with them both as of now. (One person suggested renaming, the other suggested deletion.) --Daniel. 10:17, 17 May 2011 (UTC)
I think the templates {{zh-cn}} and {{zh-tw}} should be deleted. They aren't language codes, nor are they region or dialect-specific. —CodeCat 11:55, 28 May 2011 (UTC)
It seems to me that there is an aspect of this debate that is based on a faulty premise, which is that Wiktionary categories should always be based on language and not script. Please correct me if I'm wrong, but I do not recall ever seeing a formal policy on that. The Chinese contributors (more than one now) have told you that we find it useful to have script-based categories. What's wrong with having Category:Cyrillic Serbo-Croatian slang, if it proves useful? Further more, the zh-cn/zh-tw templates may not have been a perfect solution, but these ad hoc changes have created all kinds of unintended side-effects, such as thousands of red-link Simplified/Traditional categories, where perfectly good zh-cn/zh-tw categories once existed. Are the people that made the changes planning to fix all of these side effects as well? -- A-cai 12:20, 28 May 2011 (UTC)
I don't think there is anything going against having script-based categories. The problem is that we are using script-based codes as languages, so that Simplified Chinese is treated as an entirely different language from Traditional Chinese and Mandarin. It is fine if we have script categories, but they should be named something like Category:Mandarin slang in traditional script, not Category:Simplified Chinese slang. —CodeCat 12:24, 28 May 2011 (UTC)
Nothing's wrong with Category:Cyrillic Serbo-Croatian slang, what would be wrong is Category:Cyrillic slang and having all the Cyrillic script languages in a single category. AFAICT system CodeCat is proposing above is already the one we use (such as Category:Mandarin idioms in simplified script). Mglovesfun (talk) 12:26, 28 May 2011 (UTC)
Your logic is sound. However, the question still remains. Who is going to fix the thousands of entries that still have zh-tw/zh-cn categories outside of templates? See 爸爸 for example. -- A-cai 12:30, 28 May 2011 (UTC)
P.S. I agree that Mandarin slang in simplified script is more technically correct. -- A-cai 12:44, 28 May 2011 (UTC)

Part of speech headers and headword lines


test (plural tests)

test, n., plural tests

Noun: test (plural tests)

Noun: test (plural tests)

NOUN - test (plural tests)

Template:pos n. test (plural tests)

^ Bunch of random possibilities, none
of which are good enough IMO.

I see a number of possible areas for improvement of the standards for pos headers and headword lines:

  1. The POS headers and headword lines are displayed on two separate lines, wasting up a lot of vertical screen space. Reducing this two one would greatly help usability, in my opinion.
  2. They require two lines of wikitext, which is also unnecessary.
  3. We have a lot of great grammatical appendices with helpful information about parts of speech in specific languages, but there aren't any links to them, afaik. Having, for example, the POS/headword lines for French nouns include a link to Appendix:French nouns somewhere in there would probably be useful.
  4. Headword lines really don't have enough standardization or machine readability, IMO.
  5. Headword lines are the most complex part of Wiktionary in terms of understanding how to edit it. Whether through a JS tool or just by simplifying the templates and format, this really need to be dealt with.

I think we're going have to rework part of speech headers and headword lines entirely at some point. Anyone have any ideas for possible new formats? --Yair rand 09:28, 16 May 2011 (UTC)

What we have now is a close correspondence among the "logical" structure of the entry, the structure of the display, and the structure of the wikitext that appears in the edit window. The benefits of saving screen space that may be obtained by combining the header and the inflection line come at the expense of reduced wikiness, that is, reduced accessibility of editing to low-frequency contributors. We have already experienced this with the elaborated template structure for the content of category pages, especially in its earlier incarnations. If you consider that those who develop and maintain such structures tend not to remain active for the entire life of their creations, you would have to believe in the desirability of keeping wiktionary-specific (let alone en.wikt-specific) complexity to a minimum. I have always wondered why we were so opposed to having links (or links that appeared on hovering) in headers. DCDuring TALK 12:28, 16 May 2011 (UTC)

Two years

When contributing to this revision of "Citations:Egyptic", I found a text that was a 2007 republication of a book from 1847. I mentioned both years, separated by a comma, in the same citation. If there was a more appropriate decision (such as formatting them differently or mentioning only one year), please let me know. --Daniel. 10:13, 17 May 2011 (UTC)

If it's something that's from 1847, find the original book and cite it. That's rarely going to be a problem for English books with Google Books and HathiTrust out there. Modern reprintings often change spellings and even update language, making them a tricky source for cites.--Prosfilaes 18:42, 17 May 2011 (UTC)

Wiktionary:Votes/2011-04/Derivations categories

After a number of time and threads devoted to discussing this idea, that proposal became mature enough to be voted. I scheduled it to start in a week. I invite you all to double-check the proposed set of new categories to be voted. --Daniel. 13:24, 17 May 2011 (UTC)

Wiktionary:Votes/2011-05/Replacement for Xyzy, langscript, langfamily, langprefix and others

Since there has been a lot of discussion but no clear consensus, I have created a vote containing several proposals. Each can be voted on individually. —CodeCat 13:39, 17 May 2011 (UTC)

Dropping prefixes from language templates

Currently, a few language templates have prefixes, such as etyl:, proto: or conl:. This has always complicated things, since it means we need to distinguish between {{l}} and {{lx}} and so on. I would like to propose that we drop these prefixes, since there is no real reason to distinguish these templates by name and it just creates extra technical difficulties. It would make templates like {{etyl}} simpler, and also remove the need to distinguish {{languagex}} from {{language}}. As far as I know there is no overlap between etyl templates and the remaining templates, except for proto-languages. {{proto:gem}} refers to Proto-Germanic, while {{etyl:gem}} refers to the Germanic language family. I think this means we would need to find new language codes for proto-languages. Maybe {{gem-proto}}? —CodeCat 12:30, 17 May 2011 (UTC)

I support the proposal of dropping prefixes, and the proposal of creating {{gem-proto}} to fill that gap. --Daniel. 12:49, 17 May 2011 (UTC)
I just noticed that language subtags may only be three letters. So that would mean {{gem-pro}}? —CodeCat 13:16, 17 May 2011 (UTC)
"only three letters" is not a rule, but a convention. That said, I'd still prefer {{gem-proto}}, because it is more intuitive that way. --Daniel. 13:20, 17 May 2011 (UTC)
Language subtags can only be two or three letters, yes, but that's because "language subtag" doesn't mean what it might sound like! A "language tag" (such as fr-Latn-CH, Swiss French in the Latin script) is a sequence of one or more "language subtags" (fr, Latn, and CH, in this case). The first subtag is always the "primary language" subtag (fr). Subsequent subtags can be various things, such as a "script" subtag (Latn) or a "region" subtag (CH). —RuakhTALK 14:01, 17 May 2011 (UTC) edited 14:25, 17 May 2011 (UTC); changes indicated using <del> and <ins>
More/better information: 14:13, 17 May 2011 (UTC)
Oh, and I guess that by "language subtags" you mean the extended language subtags? You're thinking of gem-proto as analogous to zh-yue? The thing is, we're inventing our own subtags, so I think it's moot. If we want to be spec-compliant, then I think we have to use gem-x-proto. —RuakhTALK 14:25, 17 May 2011 (UTC)
We already use {{gmq-osw}}, so {{gem-pro}} was supposed to be analogous to that. —CodeCat 14:37, 17 May 2011 (UTC)
Where did {{gmq-osw}} come from? Linguist List (which is in charge of non, the language code for Old Norse) gives non-swe as a "Private Use" code (!) for Old Swedish, so it seems like we should use that? —RuakhTALK 15:15, 17 May 2011 (UTC)
{{gmq-osw}} has a relatively subtle meaning: ISO code of North Germanic family + abbreviation of Old Swedish. That code has been listed at both Category:Old Swedish language and Wiktionary:Languages for ages. I oppose using "non-swe" in Wiktionary because we already have a good code; replacing it would be unnecessary work whose energy could be redirected to do something more useful... And all the Wiktionarian "private use" codes begin with ISO 639-5 anyway, not ISO 639-3, which is an ad hoc but beautiful tradition. --Daniel. 15:30, 17 May 2011 (UTC)
I've identified possible clashes of names, next to those between proto languages and language families:
This seems very minor, and could be easily solved I believe. —CodeCat 21:05, 17 May 2011 (UTC)
I think we'd need to change most of those etyl templates anyway: they're not suitable for use as language codes. —RuakhTALK 20:59, 21 May 2011 (UTC)
We could decide that we treat those 'wordy' codes the way we treat codes in {{langname}}. In other words, a code that contains uppercase letters is not a code but an etyl template, and it is a normal language template if it contains only lowercase letters. That way, only the bottom three would need to be fixed. —CodeCat 21:11, 21 May 2011 (UTC)
So far there are a few proposals but no real consensus. So I am creating a poll here. 'Code' in this context refers only to templates that are only in lowercase, following the objections above.
Support dropping prefixes for codes, use -pro for proto-languages
  1.   Support, I prefer this option but -proto is fine with me too. —CodeCat 18:20, 23 May 2011 (UTC)
  2.   Support Daniel 18:39, 23 May 2011 (UTC) I prefer this, but I'm going to vote for the other as well because I don't mind supporting it. --Daniel 18:39, 23 May 2011 (UTC)
Support dropping prefixes for codes, use -proto for proto-languages
  1.   Support Daniel 18:39, 23 May 2011 (UTC)
Support dropping prefixes for codes, use some other format for proto-languages
  1.   Oppose, at least for now, because I don't really get the point. How is gem-proto (or gem-pro) any better than proto:gem? If the problem is that templates need to know whether to prepend proto:, then doesn't this just change things so they need to know whether to append -proto (or -pro)? —RuakhTALK 13:47, 29 May 2011 (UTC)
    I don't think so. Templates will never need to append "-pro" to "gem" to form "gem-pro" automatically, because (supposedly) people will have to type "gem-pro" by themselves. For example, in Category:Proto-Germanic nouns, instead of using the (arguably) ambigous code {{poscatboiler|gem|noun}}, the new code {{poscatboiler|gem-pro|noun}} should be used according to the proposal. --Daniel 14:11, 29 May 2011 (UTC)
    Indeed, it would be like that. The problem with prefixes has always been that they could be dropped. If we agree to include the prefixes all the time that's fine, but then we also end up with lang="proto:gem" in our HTML code, which is not what we want. —CodeCat 14:13, 29 May 2011 (UTC)
    In that case I oppose because we shouldn't be putting invalid codes in our generated HTML. —RuakhTALK 14:50, 29 May 2011 (UTC)
    So which do you prefer then, gem (a family code, current practice), proto:gem (invalid, but the name of the template) or gem-pro (a valid code). —CodeCat 14:57, 29 May 2011 (UTC)
    I don't know how you decided that gem-pro was a valid code; it's not. But to answer your question — my preference would be either a valid code, if there is one (maybe lang="gem-x-proto"?), or else just lang="", which is a standard way of explicitly not indicating the language. —RuakhTALK 15:05, 29 May 2011 (UTC)
      Oppose, I don't get it. Furthermore for etyl templates, they are specifically set up not to be valid language templates, like {{etyl:prv}} or {{etyl:ONF.}}. If we drop the prefixes, what will stop them being from being valid? Mglovesfun (talk) 14:16, 29 May 2011 (UTC)
    But they are valid language templates because they exist. I don't see how a prefix makes them magically invalid. I could write {{infl|etyl:prv|noun}} if I wanted to and it would work. Using prefixes to distinguish the template doesn't make any sense to me. We already have categories and policies for that. —CodeCat 14:36, 29 May 2011 (UTC)
    A prefix makes it magically obvious that they are invalid. And {{infl|etyl:prv|noun}} isn't guaranteed to work; it happens to work because {{etyl:prv}} happens to generate simply Provençal when its first argument isn't provided, but it would be perfectly fine for someone to change it to do something different in that case. —RuakhTALK 14:50, 29 May 2011 (UTC)
    But invalid for what purpose? Are we going to distinguish every possible use by changing the names of templates? Why don't we split up etyl: templates depending on whether they refer to families or languages? Why create technical barriers that solve a small problem but create a lot of bigger ones? —CodeCat 15:07, 29 May 2011 (UTC)
    I don't believe that any problems are caused by the special etyl: templates. At least, I'm not aware of any, and if you're aware of some, then I have to wonder why you've never mentioned them. —RuakhTALK 15:23, 29 May 2011 (UTC)
    I've already mentioned several problems, and those are only the problems we're aware of currently. It seems like a very ad hoc solution to a problem that didn't really exist in the first place, but because of that solution we have had to devise all kinds of workarounds. And those workarounds will only increase in number. Just because it works now doesn't mean it's not fundamentally bad design. —CodeCat 15:28, 29 May 2011 (UTC)
    You think you've mentioned problems; I think you haven't. I guess we'll have to agree to disagree. —RuakhTALK 17:18, 29 May 2011 (UTC)
    And the reason for dropping the prefix is because we are already using templates like {{langprefix}}, {{termx}} and so on, which do nothing except circumvent one technical barrier with another. And furthermore, there is a proposal in a vote that will change the way derivations categories are handled, which would greatly benefit from removing artificial restrictions like this one. —CodeCat 14:47, 29 May 2011 (UTC)
    I support orphaning and deleting most of those templates. They seem completely misguided. —RuakhTALK 14:50, 29 May 2011 (UTC)
    But to orphan them this change needs to pass, because as long as language code prefixes still exist, there needs to be a way to handle them. Those templates were created as a solution to that. And I distinctly remember you complaining that {{term}} should not be modified, so that is why all these parallel templates were created instead. —CodeCat 14:57, 29 May 2011 (UTC)
    Right, I retract my previous 'oppose'. Mglovesfun (talk) 15:00, 29 May 2011 (UTC)
    {{term}} should not support them and {{termx}} should not exist. Both approaches are misguided, because they're both misguided ways of solving a non-existent problem. —RuakhTALK 15:05, 29 May 2011 (UTC)
    I agree that {{termx}} should not exist, but if {{term}} could take over its function, it would be ok. —CodeCat 15:14, 29 May 2011 (UTC)

Name Nithi Watanachai

I'm not sure what kind of useful content might be put into the redlink, so I think we should create-protect it for the time being. TeleComNasSprVen 14:19, 17 May 2011 (UTC)

Poll: Renaming some metatemplates

I propose renaming a few metatemplates. I chose not to use WT:RFM because they are highly-used templates, and I chose not to create a vote because this is a relatively minor issue nonetheless; here is a good middle ground for that.

When these metatemplates were few and new, it was easier [to me] to remember them by their abbreviated names, but now I think it's better to change them to something longer and more intuitive. Any of these names most likely are not going to be often typed by people anyway; only [many] other templates use these metatemplates.

(Notably, there's the ongoing proposal of just deleting a number of these metatemplates to replace them by thousands of small ones. Meanwhile, since the templates still exist, I'll just stick with this proposal of improving them by renaming them.)

Now here's my simple question:

Can the existing names (listed below) be replaced by the proposed names (listed below too)?

Thank you. --Daniel. 18:22, 17 May 2011 (UTC)

Yes, rename them
  1.   Support Daniel. 18:22, 17 May 2011 (UTC)
  2.   Support Dan Polansky 10:27, 18 May 2011 (UTC), especially if it is true that the templates are only used in other templates. The short names are incomprehesible. It is so much easier to read code that uses self-explanatory names, even if they are a bit longer; else the reader of code who is no yet familiar with the names has to constantly check the documentation, or better check the code, as the doc is often misleading anyway. A comprehensible shorter name would be okay, such as "lang code to family code" or "lang code to lang cat name", but I see no disadvantage to long names given they are pasted. Using Lisp-like slashes would have some charm, like "language-code-to-family-code". I admit that the shortness of "en-adj", which is used in the mainspace, is more justified for such an often used template. --Dan Polansky 10:27, 18 May 2011 (UTC)
  3.   Support We definitely need template naming guidelines. I personally prefer fully-expanded whitespaceless PascalCasing without pointless vowel-dropping abbreviations that save two milliseconds of typing but require additional effort in their memorization, usually at the expense of many faulty attempts to guess the exact API/template name. Lispish hyphens are also fine by me. This is not a particularly urgent matter since templates are rarely written, though it would be nice to have a certain degree of uniformity. so LanguageCodeToFamilyCode or Language-Code-To-Family-Code, or language-code-to-family-code in the worst case. --Ivan Štambuk 19:13, 21 May 2011 (UTC)
  4.   SupportRuakhTALK 20:57, 21 May 2011 (UTC)
No, don't rename them
  1.   Oppose see w:KISS principle -- Prince Kassad 18:48, 17 May 2011 (UTC)
    • The part "Keep it simple!" of the "KISS principle" would, in fact, be a reason to want longer names, because the shorter ones are less simple (in other words, they're more complex). --Daniel. 19:00, 17 May 2011 (UTC)
  2.   Oppose Templates are fine with short names, and in any case, we have a vote in progress that would make several of these templates obsolete. —CodeCat 18:53, 17 May 2011 (UTC)
  3.   Oppose. --Yair rand 19:29, 17 May 2011 (UTC)
      Weak oppose. Some of these short names are not very clear, but given the name of a template, it's easy to find out what it does (just look at the documentation), whereas given the thing that you're trying to do, it's not so easy to find the template that does it. I regularly have to resort to finding an entry that already uses the template that I know exists, but the details of whose name I can't remember. Something like "script" is easier to remember, and to get exactly right, than something like "script code to script name", no matter how much better the latter is in other respects. —RuakhTALK 20:30, 17 May 2011 (UTC)
    On second thought, I retract my opposition. Daniel. says above that "Any of these names most likely are not going to be often typed by people anyway; only [many] other templates use these metatemplates", and that does seem to be accurate. The templates in question are likely to be created by copying and pasting from existing templates regardless. —RuakhTALK 00:59, 18 May 2011 (UTC)
  4. I feel about the same as Ruakh. Mglovesfun (talk) 20:41, 17 May 2011 (UTC)
    That is, the bit he's now struck through. Mglovesfun (talk) 23:47, 18 May 2011 (UTC)
  5.   Oppose. Cannot see any advantage of it. (And yes, I've read the reasoning above.) -- Gauss 11:57, 22 May 2011 (UTC)
I am indecisive or indifferent
  1. A more descriptive name is not necessary if the description is in the /doc page. OTOH, it certainly can't hurt. On the first hand again, the short name is more useful for typing (even in other templates, and certainly in discussions). Thus, I don't particularly want the rename, but don't mind it provided the old name is kept as a redirect.​—msh210 (talk) 18:31, 17 May 2011 (UTC)
    "[M]ore useful for typing": and for remembering, as Ruakh notes above.​—msh210 (talk) 21:04, 17 May 2011 (UTC)

I'm very surprised by seeing relatively so many people opposing the new names, despite the older names being incomprehensible and virtually being used only by templates. (For instance, people use {{wikipedia}} which uses {{wikimedia language}}, but people don't use {{wikimedia language}} directly.)

I would really like to be able to use the comprehensible versions. Would you guys oppose (or support) the alternative idea of creating all these new names as redirect to the older names?

I know that creating redirects for templates is often uncontroversial, but I also thought I "knew" that renaming these templates would be uncontroversial too in the first place. I'm scared. --Daniel. 03:47, 21 May 2011 (UTC)

Well, the proposal failed (as of now) and nobody complained about the alternative of creating redirects. So I'm going to create some redirects. --Daniel 17:20, 23 May 2011 (UTC)

hyphenated words

¶ We have a category for words spelt with a hyphen‐minus: [3]. It has often been ignored, though. Should we categorize any English terms which contain that character? --Pilcrow 19:17, 18 May 2011 (UTC)

Category:English terms spelled with + contains few members; Category:English terms spelled with 6 contains few dozens of members; and Category:English terms spelled with Æ contains few hundreds of members.
I think the category you proposed would easily become the most populated of them. Well, I don't mind having a category filled with many, many English terms spelled with a hyphen, especially since they can't be listed through Special:Search anyway.
In long term, I think it would be useful to create eventually subcategories, or appendices, or lists of derived terms, or something else, like these:
  • "Category:English terms spelled with - as a substitute for letters" (containing G-d)
  • "Category:English terms spelled with - meaning minus" (containing A-)
And so on. Meanwhile, I support the proposal of populating Category:English terms spelled with -, preferably without any prefix, suffix or other affix if possible, because they can already be found at their respective categories (which can be linked in one way or another to the category to be populated). --Daniel. 23:34, 18 May 2011 (UTC)
Affixes aren't spelled with hyphens, it's a dictionary convention to show the different between a word (such as bi) and an affix (such as bi-). Oh and in my opinion, let's stick to characters, not what they mean; G-d and A- are both spelled with the same character. Mglovesfun (talk) 23:46, 18 May 2011 (UTC)
I still think it was a good idea to ask specifically not to add affixes because they have to be taken into consideration in case Pilcrow (or anyone else) populates the category in question by using a bot. --Daniel. 00:06, 19 May 2011 (UTC)
Yeah I mean don't include affixes, as they are not spelled with - but by convention contain hyphens in the title. Thus, excluding them is very much rational, whereas A-, to-day and G-d are spelled with -. Mglovesfun (talk) 00:27, 19 May 2011 (UTC)
To-day and A- are not really written with the same character; the former is written with a hyphen, the latter with a minus sign. People online normally use the same Unicode character for both (the hyphen-minus, inherited from ASCII), but in handwriting and in printed works, they look noticeably different. (Not incidentally, Unicode does have a dedicated hyphen, U+2010, and a dedicated minus sign, U+2212, which normally have different glyphs — vs. — though I don't advocate using them in our entry-names.) G-d is trickier; I write it more or less as a hyphen, but some people write it more like a dash, which then ends up looking somewhat more like a minus sign than like a regular hyphen (IMHO). —RuakhTALK 02:12, 19 May 2011 (UTC)
The proposal passed: this was a nice conversation comprised of a number of subjects involving hyphens and similar characters, and none of them involved answering "No!" to Pilcrow's initial question.
I believe Category:English terms spelled with - can be populated as suggested. ----Daniel 18:52, 24 May 2011 (UTC)

What part of speech is a verb with an incorporated object?

In Biblical Hebrew, a direct object pronoun can be a suffix of a finite verb, e.g. וַאֲבָרֶכְךָ "and I will bless thee". In Old Irish, a direct object pronoun can be a suffix or an infix of a finite verb, e.g. atomchí "(s/he) sees me" (cf. ad·cí ((s/he) sees)). In many Native American languages, even direct object nouns can be incorporated into a finite verb, e.g. Oneida waʼkenaktahninú: "I bought a bed" (the nakt is the part that means "bed"). So what POS are such forms? Verbs? Should such entries be categorized as "Foobar verb forms"? —Angr 09:22, 19 May 2011 (UTC)

Hebrew אֲבָרֶכְךָ‎ is definitely a verb form; I think וַאֲבָרֶכְךָ might be SOP, though. (Actually, I'm a bit confused; in Genesis 12:2 it means "and I will bless thee", as you say it does, but now that I think about it, it looks to me like a vav consecutive form that should mean "and I have blessed thee" . . . needless to say, Biblical Hebrew is not my strong suit!) We don't yet have form-of templates for Hebrew verbs with direct object pronouns, but see the entry for קִדְּשָׁנוּ(kid'shánu) for one idea how they might look. I assume the same is true in the other languages you mention. Even in English, when we include idioms of the form verb+object (have a cow and kick the bucket and so on), we classify them as "verbs", since we don't have a separate "verb phrase" POS header. By the way, I'm not sure whether we should include Oneida verbs with incorporated direct objects. —RuakhTALK 12:01, 19 May 2011 (UTC)
Yeah, I wasn't worrying about the vav consecutive, which probably shouldn't be included in entries. קִדְּשָׁנוּ(kid'shánu) is exactly what I was thinking of. As for Oneida and other languages (lots of American languages do this), I'm a little torn. On the one hand, they really are rather sum-of-parts, and multiplying every transitive verb by every noun (even if it's only every noun that can reasonably be a direct object of that verb) will result in millions of forms in each language that does this. On the other hand, waʼkenaktahninú: is a single word of Oneida just as much as קִדְּשָׁנוּ(kid'shánu) is a single word of Hebrew. But whether or not to include incorporated noun objects is an issue for a different discussion. For now, it's enough for me to know that I can call atom·chí a verb and put it in Category:Old Irish verb forms. —Angr 12:17, 19 May 2011 (UTC)

Lemma form of Proto-Indo-European verbs

Currently, we don't have any lemma form of verbs in PIE. We name the entries after the root or stem. However, we do have lemmas for nouns and adjectives. I realise the reason we don't have lemmas for verbs is because PIE had no infinitive, but we could use the third-person singular form (the only form every verb has). I would like to propose that if the third-person singular form of a PIE verb is known, that we use it as the entry name rather than the root. —CodeCat 14:30, 21 May 2011 (UTC)

It's more complicated than that. For a lot of IE verbs, it's clear what the root is, but it isn't clear what the present stem is. Maybe the Sanskrit has a -ske/o- suffix, the Greek a -ye/o- suffix, the Latin a nasal infix, etc.; then we just don't know which formation was used in the parent language. I think it's best if we stick to roots for verbs, just as other dictionaries like Pokorny and Watkins do. —Angr 14:42, 21 May 2011 (UTC)
I said if the form is known. If there is a conflict that doesn't seem resolvable, we should stay with the longest known part. But a verb like Template:termx could easily be moved to Template:termx because the reconstructions match, and similar for Template:termx to Template:termx. —CodeCat 14:57, 21 May 2011 (UTC)
95% of all roots don't have a single stem form/conjugation class that can be reconstructed as the "original one". In particular because no such thing as the "original paradigm" can be reconstructed due to the great chronological disparity that exists among attested languages. Additionally, there are multiple, conflicting theories of PIE verbs, and the issue of inflectional endings, not to mention the nature and number of paradigms, is by no means settled. While the inflectional tables listed at our PIE and Proto-Germanic noun, adjective and verbal roots pages do give the impression of a coherent, singular language with great deal of regularity, they merely represent a POV of a single Indo-Europeanist (Donald Ringe), and not a general scholarly consensus because no such exists. In case of the mentioned two roots, for example, LIV also lists athematic reduplicative present stem for *bʰer- on the basis of Sanskrit bibharti, and for *h₁es- also a *-sḱe/o- form (on the basis of Palaic, Epic Greek and Tocharian B). The established practice in all the PIE dictionaries has been to use the e-grade root as the lemma, and I see no reason to depart from this widely adopted standard. Reconstructed entries are not attested by definition, and there is no need to impose the mainspace attestation prerequisite on them. PIE had no prefixes, so they can't be mistaken for one anyway.
PS: There exists the same problem with Sanskrit verbs, for which there are 10 conjugational classes, the root (dhātu) form is commonly listed in the dictionaries, but since the root form is never actually used in the running text we (I..) have chosen to use the 3rd-person-singular present active form (which is never used as a lemma in dictionaries, but is commonly listed a citation form in the literature), but which can be a problem because lots of words undergo several conjugational classes, in particular in older texts (Vedic age), and only later (in the classical period) have many roots become commonly attached to certain paradigms, oftentimes even a newly invented one (there is lot of artificiality in Sanskrit!), which is of course a problem since we want to be biased in favor of the older, "original" paradigms, but which have much lower relative usage in the corpus. --Ivan Štambuk 19:01, 21 May 2011 (UTC)
I wasn't aware there was still so much disagreement among scholars, I thought certain verbs had been pretty much worked out completely. But just so you know Template:termx is a prefix in PIE! —CodeCat 19:36, 21 May 2011 (UTC)
That's scholars for you! Mglovesfun (talk) 12:52, 22 May 2011 (UTC)
Yep, almost as bad as Wiktionarians! :p —CodeCat 13:30, 22 May 2011 (UTC)

Category:Flemish language

This has failed RFDO but nobody know what to do with its contents. Some say merge Flemish into Dutch, some say rename it to West Vlaams, essentially keeping the entries as they are but with new language headers and new categories. It's more or less the same situation as with {{Xyzy}} - we can agree on what we don't want, but not what we want! Mglovesfun (talk) 12:38, 22 May 2011 (UTC)

Regardless of what we do with the contents, the code 'vls' should be used to refer to West Flemish, because that is the only part of Flemish that is recognised as a language as far as I know. —CodeCat 12:47, 22 May 2011 (UTC)
Ethnologue has it as Vlaams, while EncycloPetey was arguing outright deletion, merging the entries into Dutch. Lots of hurdles here, I'm afraid. Mglovesfun (talk) 12:51, 22 May 2011 (UTC)
I created this. I hope it helps. Have a happy discussion. --Daniel. 13:27, 22 May 2011 (UTC)
Since they share the same standard, isn't it parallel to Serbo-Croatian? The modern political connotations only really arise because 'Nederlands' now refers to a country, 'Nederland'. However this is quite recent, and not so long ago 'the Netherlands' included Belgium. —CodeCat 13:37, 22 May 2011 (UTC)

Adding the prefix en: to English topical categories

Currently the structure of topical categories doesn't match other language-specific categories. We categorise non-English topical categories as subcategories of the English ones. To me, that seems rather strange and confusing. So I would like to propose that we categorise all English topical entries in a category containing the language prefix en: just like other languages. The remainder of the categories, including the categories with no prefix, would stay the same, except that they would of course no longer contain English entries. —CodeCat 13:35, 22 May 2011 (UTC)

I like this old proposal. --Daniel. 13:39, 22 May 2011 (UTC)
Whereas this is the English Wiktionary, we have many skilled technical adepts, making life simple for English-language contributors makes the project more open to new contributors, topical categorization is no more meritorious of restriction as to contributors than other content, this is a bad idea. DCDuring TALK 15:27, 22 May 2011 (UTC)
(The last message may need some clarification. Here it is. Feel free to correct me if inaccurate.) DCDuring, in the past, if I remember correctly, has defended the notion that the "en:" at the start of category names makes using them more difficult for potential contributors who only speak English. --Daniel. 15:35, 22 May 2011 (UTC)
Support. Currently, Category:Plants has six "real" subcategories, such as Category:Trees, but good luck finding them: they're mixed in with one hundred seventy categories of the form [[Category:foo:Plants]]. All of our topical category structure is nice and easy to navigate — except the English ones, because they're forced to contain the hundreds of non-English ones at every level. (In other words: I think DCDuring has it exactly backward!)
That said, I don't think we should be using language codes at all. Something like Category:Plants (Mandarin) is much more readily understood than something like Category:cmn:Plants. So really what I support is using the same system for English as for other languages, which currently means inserting en: but will hopefully someday mean something better.
RuakhTALK 15:46, 22 May 2011 (UTC)
It is by no means necessary that both English entries and for the language-specific topical categories be members of the same category. That is merely how we have done it so far. If we had initially designed our categories from the point of view of users and made ordinary users the main focus now we would undoubtedly do many things differently. Now seems like a good time to start. DCDuring TALK 16:49, 22 May 2011 (UTC)
The huge upper space dedicated to listing foreign-language versions of Category:Plants makes users (or me, with my current 1024x768 resolution, at least) have to scroll to see the actual English entries, which is a bad thing too. --Daniel. 16:08, 22 May 2011 (UTC)
I agree that using language codes in categories meant for non-technical users isn't good, but that's not the problem that's to be solved right now. I should make a note though. Under the new proposal, the subcategories and language-specific categories would still be mixed together. However, this would now only be an 'umbrella' category. Category:Communication would still have all the categories it has now, along with the newly created Category:en:Communication, but with no entries. So we have merely moved the difficulty-to-navigate from English to the umbrella categories, it hasn't gone away. That can be solved in different ways (perhaps by following the example of lexical categories). —CodeCat 16:10, 22 May 2011 (UTC)
Re: "So we have merely moved the difficulty-to-navigate from English to the umbrella categories, it hasn't gone away": That's a good point, but I don't see it as a major issue. As long as it's easy to navigate within the English categories, I don't see that anyone will really need the umbrella categories except when specifically trying to navigate among languages. —RuakhTALK 18:13, 22 May 2011 (UTC)
I don't see a problem in making small, incremental changes when we can, either. It allows us to fine-tune as we go. :) —CodeCat 18:25, 22 May 2011 (UTC)
Topic categories are very difficult to use at the moment, with all categories of all languages thrown together at the bottom of the entry, with nothing but language codes to differentiate the languages' categories from each other (not that this would be significantly improved by using language names...). If TabbedLanguages ever gets implemented, category sorting would make topic categories much more usable, and it might also make sense to remove the categories' prefixes from the display. --Yair rand 16:25, 22 May 2011 (UTC)
I still support this - I was the one that first wrote a vote on the subject, which 'failed' with a 62.5% approval rating. Mglovesfun (talk) 19:53, 22 May 2011 (UTC)
The vote in question was opposed by, among other people, EncycloPetey, Carolina wren, Robert Ulmann and Razorflame, who have been away from Wiktionary. I'm curious to see what would be the result of a new vote on the same subject, especially since everyone who supported the first vote is still active today. --Daniel. 20:02, 22 May 2011 (UTC)
I've created a second vote, since it looks like there is enough support for it to pass: Wiktionary:Votes/pl-2011-05/Add en: to English topical categories, part 2. —CodeCat 20:38, 22 May 2011 (UTC)
The vote has now started! —CodeCat 00:12, 29 May 2011 (UTC)

When is something a topical category?

I've heard a few people say that a topical category concerns what a word is about, and a lexical category concerns what (kind of word) it is. We currently have Category:Demonyms, which is a topical category. And there are several others like it. However, a word isn't about a demonym, the word itself is a demonym. But demonyms are words about places. Is there a kind of rule we can apply here, so that we know what is topical and what is lexical? —CodeCat 14:15, 22 May 2011 (UTC)

This distinction is hard sometimes and is not yet well formed. Here is a simple explanation for what is used in practice but is ugly and very unintuitive in theory.
  • Demonyms are words for inhabitants of places, such as "Alabaman" or "Brazilian".
  • "Category:English demonyms" would be a lexical category with words that are demonyms.
  • "Category:Demonyms" would be an incorrect name for a lexical category whose correct name would be "Category:English demonyms".
  • "Category:Inhabitants" would be a topical category with words for inhabitants of places.
  • "Category:English inhabitants" would be an incorrect name for a topical category whose correct name would be "Category:Inhabitants".
  • If "Category:English demonyms" and "Category:Inhabitants" existed simultaneously and without further subcategories, they could have exactly the same entries.
--Daniel. 14:51, 22 May 2011 (UTC)
So if I understand it correctly, lexical categories relate to the word and its usage itself, while topical categories relate to its meaning? And there are cases like the one above that the same 'field' can be covered both lexically and semantically? —CodeCat 14:54, 22 May 2011 (UTC)
There are disagreements about that big subject. For example, I've heard a few times one or more people saying that we shouldn't have topical categories containing members of only one part of sppech (in this case, there are adjectives and nouns), but that proposed rule would negate the existence of Category:Days of the week and Category:Languages. On the other hand, we have "Category:French cardinal numbers" instead of "Category:fr:Cardinal numbers". --Daniel. 15:00, 22 May 2011 (UTC)
I didn't think that a category had to be one or the other, rather than both. Syntax and semantics are not mutually exclusive categories in the real world. Why would they have to be here? Why would we have to have duplicate categories for categories that had both semantic and syntactic elements? DCDuring TALK 15:37, 22 May 2011 (UTC)
I believe nobody has defended the ideas of syntax and semantics being mutually exclusive in the real world, or forcing them to be mutually exclusive in Wiktionary, or having duplicate categories as described above. My example about "Category:English demonyms" and "Category:Inhabitants" was a means of criticism, not a suggestion of a practice to be implemented here. --Daniel. 15:45, 22 May 2011 (UTC)
Codecat opened this conversation with a mention of just that distinction, albeit using the word "lexical" where "syntactic" better reflects that actual nature of most of such "lexical" categories. DCDuring TALK 16:49, 22 May 2011 (UTC)
This distinction does exist, because it has been widely implemented; and it works well for many categories. (We don't want the entry degrees of comparison categorized into Category:English adjectives, after all.) We (the three of us) are just pointing out that the distinction does not work well for all categories. --Daniel. 19:54, 22 May 2011 (UTC)

Definitonless, almost contentless Chinese characters

What do we do with things like 黄砂? There are many, many more where that came from: see Category:Mandarin definitions needed and the ones for Cantonese, Wu, Middle Chinese, Vietnamese, etc. It seems to go back to NanshuBot (talkcontribs), who as early as 2003, right at the dawn of Wiktionary, created thousands upon thousands of CJKV character entries with essentially no content. Just headers and numbers of strokes. Is it better to leave these as they are, hoping that eventually some generous souls will fill in the 20 000 or so missing definitions, or is it better to delete the empty sections? --Mglovesfun (talk) 15:20, 23 May 2011 (UTC)

While it might be what we would like, what we currently have is better than none at all. I don't think we should remove useful information just because there isn't more information. —CodeCat 15:24, 23 May 2011 (UTC)
Some of us can remember when these things made up over 50% of this wiki. I argued with Connel (I think) for their removal, but to no avail. SemperBlotto 15:31, 23 May 2011 (UTC)
I basically agree with CodeCat. The widely implemented rule of "either create a well-formatted and well-defined entry, or don't create any entry at all!" has its charm, but I too don't think we should remove useful information just because there isn't more information. --Daniel 15:46, 23 May 2011 (UTC)
Referring specifically to 黄砂, the Japanese and Mandarin entries are clearly fine, good in fact, but what about the other five sections? Do any of them offer useful information? Useful to who? --Mglovesfun (talk) 16:00, 23 May 2011 (UTC)
I, personally, like to study romanizations; however, you're right, these sections are too incomplete as of now. I naturally would probably just seek other resources if I tried to study this entry and its romanizations.
With that in mind, I propose something else: Moving all these incomplete language sections to the talk pages, probably under a discussion header "Incomplete language sections". That way we don't "lose" information while we wait for more complete entries. --Daniel 16:21, 23 May 2011 (UTC)
What I should have said was it depends how you define 'useful information'. Pretty sure I once removed a Korean section that was just:


I think that would count as no usable content given. The Wu section of the entry above is just:

黄砂 ɦuaŋ so
Mglovesfun (talk) 23:51, 23 May 2011 (UTC)
It's because I'm trying to clear User:Yair rand/uncategorized language sections/Not English. The remaining Mandarin, Cantonese, Ga and Middle Chinese ones for the most part don't even have headers, so there's no way I can categorize them. Some of them could be categorized as 'Han characters', though 'Han character' doesn't tell you how it's used, a bit like saying that I is an English letter, that doesn't tell you it's also a pronoun. --Mglovesfun (talk) 10:41, 24 May 2011 (UTC)
The romanization is useful information. Of course the entries aren't complete, and of course a header that says "Korean" without even a romanization under it can be removed, but people might well want to know that 黄砂 is pronounced "ɦuaŋ so" in Wu without having to look on the talk page to find it. —Angr 15:35, 24 May 2011 (UTC)
Sounds like a consensus. --Mglovesfun (talk) 12:10, 27 May 2011 (UTC)
Mglovesfun, please don't destroy what was done by others. The entries have useful information, although incomplete. The Chinese characters may not be easily categorised under parts of speech but the language specific pronunciation, generic meanings, stroke orders, etc. is all important. I have fixed Mandarin 黄砂 and recreated deleted 黃砂.--Anatoli 02:46, 29 May 2011 (UTC)
The solution here is to fill these with content. It will take time, but eventually it will be done. Cheers! bd2412 T 05:38, 29 May 2011 (UTC)
Vietnamese and Korean actually belong to the traditional section, the dialects belong to both, like Mandarin. --Anatoli 10:59, 29 May 2011 (UTC)
[quoting Anatoli] "Mglovesfun, please don't destroy what was done by others." Huh? Have you read the debate? The consensus is the keep them. Nitpick, not done by 'others' but by a bot, unless by 'others' you're including non-human others. Mglovesfun (talk) 11:02, 29 May 2011 (UTC)
Sorry, if I was rude.
I have fixed/checked most language entries for 黄砂 and 黃砂 - Mandarin, Japanese, Wu, Hakka, Cantonese, Vietnamese and Korean, added Korean and Vietnamese in proper scripts. Perhaps they can be used as samples. --Anatoli 11:23, 29 May 2011 (UTC)
Not rude, I think you may have assumed I wanted to delete them, which given my past history, is a fair assumption. Albeit not correct this time. Mglovesfun (talk) 11:31, 29 May 2011 (UTC)

How to mark Japanese Pitch Accent, revisited (for batch import)

I'm in the process of compiling a list of Japanese words with pitch accent, extracted from the Japanese Wiktionary. I'm very excited about this, since this has always been a glaring omission from the freely available Japanese dictionaries. It'll probably look something like the following: (I also plan to clean up カタカナ vs ひらがな, めい vs めー etc.)

  1. 湖辺 こへん [0]
  2. 真勇 しんゆー [0]
  3. 厩舎 きゅーしゃ [1]
  4. 婦人病 ふじんびょう [0]
  5. 氏名 しめい [0]
  6. どろ ドロ [2]
  7. どれ どれ [1]
  8. 悲嘆 ひたん [0]
  9. ささやく ささやく [3]

Then I'll have to match them to existing definitions in English wiktionary, which brings a whole new slew of problems (e.g. 囁く vs ささやく above). Oh well.

Now, as to formatting. Wiktionary:About_Japanese Doesn't mention pitch accent. Also, there doesn't seem to be a whole lot of entries with pitch accent in English Wikipedia to go from, so there seems to be no guidelines really. Currently, some entries use romaji or IPA only. Some use ↗ and ↘ like in Japanese wiktionaries. User:友枝真樹 has written a nice template that some articles use.

There has been some discussion on this topic, notable here, here and here (how Japanese dictionaries mark it).

Personally (as mentioned before), I'd prefer the number-system, combined with the line-marks-high as seen in the NHK accent dictionary. This in addition to any IPA/romaji. The syntax for the number-system is easy, copy-paste friendly and allows for a link with an explanation. The NHK line-system allows you to see at a glance were the down-step happens, which is nice. Like this:

  • 氏名 しめい [0]


  • 箸 [1]


  • 橋 は [2]


  • 端 は [0]


  • ンドッチ [4]

I'm planning to write a new template to make this work (based on the one written by User:友枝真樹). This would be one way to do it:


Another way that has been suggested is to use two unicode characters that are designed to mark Japanese pitch accent: "˹" and "˺" Personally though, I think ヘ゛ベ and ペ look too similar. See below for how I imagine these could be used:

  • 氏名 し˹めい [0]


  • 箸 は˺し [1]


  • 橋 はし˺ [2]


  • 端 は˹し [0]


  • サ˹ンドイ˺ッチ [4]

Any comments? (Updated) --Vaste 17:00, 25 May 2011 (UTC)

I agree with you, the alternative way is too similar and it is confusing. The way I learned (many years ago) was to use acute accents on the Romaji transliteration, such as sáke, saké, hána, haná. For me this way (possibly because I’ve known it for so long) is by far the easiest. The thing that bothers me about the めい method is that it seems to require more knowledge of and facility with Japanese writing, but a lot of our users are beginners in the language. めい would be good for those who know the language well, but for beginners, I think it would be easier to deal with this: Sono musumé o míreba, míru hodo kírei da to omoimásu. Of course, this way also requires learning some basic rules. Maybe both ways can be incorporated together. —Stephen (Talk) 22:17, 25 May 2011 (UTC)
I am pessimistic about the project. My reason is that to enter the pitch accent accurately and consistently, we need people with a good knowledge on the subject and good references. A transliteration without the pitch accent may also mislead users. I've got a couple of Japanese Russian dictionaries, which use two of the above mentioned methods but in reality, most people (e.g. teachers and learners) and dictionaries ignore the Japanese pitch accent. Even if dictionaries provide it, Japanese learners pick up how to pronounce the words or rather full sentences by listening and speaking, by immersion, not relying on dictionaries because one need some training to follow the pitch accent with a native speaker. People who get this kind of training may no longer need dictionaries with pitch accents. And another thing, failure to follow the correct pitch accent seldom causes misunderstanding. There are a few regional variations in Japan, again it seldom causes any problems, some stories about different pronunciation of はし is a notable exception but this happens when one really wants to confuse, hence there are some puns and games based on words pronounced similarly but with different pitch tones. --Anatoli 01:31, 26 May 2011 (UTC)
It is certainly someting that's neglected, especially by Western students. It's often not taught at all, or only mentioned in one lesson as something that exists. Many are probably completely unaware of it I'd say. We simply don't hear/notice the difference, and since it's not marked in writing you won't easily notice it one your own either. Also, you do just fine communicating without it. Just like I think Japanese speakers with a heavy English accent are easy enough to understand, once you get used to it. This however, doesn't mean good pronunciation should be ignored, and not be studied!
A funny thing also is that after studying Chinese, a tonal language, the presence of accent suddenly seemed much more obvious to me. So presumably Chinese/Vietnamese/Thai etc should notice this more easily. --Vaste 08:05, 26 May 2011 (UTC)
Making the information available does not obligate a reader to pay attention to it or learn it. If the information is there, anyone who wishes to ignore it can still do so. It’s no different from adding gender designations to German nouns ... if a user only wants to see the definition of a German noun, but does not care about its gender, then he is free to ignore that part. But if we make the information available, then the users who do want to know the gender of a German noun, or the pitch accent of a Japanese word, can get the information that he wants. Each bit of linguistic information that we provide for any word in any language, whether spelling, definition, synonym, declension, or just an anagram, is only there for those users who are interested in that particular linguistic information. Any reader who does not care about it is welcome to ignore it.
The issue is how (in what form and manner) to show the pitch accent for those users who, like me, are interested in it. If this were the Japanese Wiktionary, all the users would be proficient in Japanese and there would be no need for Romaji transliterations. But this is the English Wiktionary, and a lot of our users do not have advanced knowledge of the Japanese language or writing, so a different way of presenting the information is worth considering. —Stephen (Talk) 06:55, 26 May 2011 (UTC)
I think it's reasonable to assume that any serious student interested in the pitch accent of a word at least would know the kana. Or am I wrong here? My intention from the start was to augment the kana-part of the entries. I still think it'd be nice to include IPA and/or romaji, but I wasn't planning on doing that myself in this initial import. Maybe it could be automatically generated from the kana though? Perhaps even changing the template would be enough...
Besides, if marking them on romaji, I'm unsure of what system to use. Besides accents like you showed, I remember seeing something similar to サンドッチ in romaji somewhere. (Martin's dictionary?) Another point is that many users are taught to dislike romaji, refusing to use it or at least preferring not to use it. You might consider it silly, but it's worth considering I think. Many dictionaries don't provide romaji at all.
These two systems (number+overline) are what I've actually seen used in dictionaries in the wild, in monolingual dictionaries, which might be the real reason why I'm partial to them. Anyone who has used these dictionaries (e.g. on a denshi jisho) are going to be/should be familiar with these systems. If not, it'd still be useful for them to know learn about them (since they can then use these common dictionaries).
Btw, having studied Chinese (and French for that matter) I'd easily be confused as to the meaning of this notation: Sono musumé o míreba (different vowel like é vs è, or different tone like shì vs shí?) So this certainly would need explanation. サンドッチ is visually pretty clear once you've learned there are high/low pitches.
....So anyway, let's mark both kana and IPA/romaji. But first, how should we mark kana? --Vaste 08:05, 26 May 2011 (UTC)
Vaste, you don't seem to have many contributions at the English Wiktionary. Are you here to stay? Many good ideas die before they make any impact. Users lose enthusiasm, real life gets in the way, etc. We have many thousands of Japanese entries, translations and many more yet to be added in a simple or advanced form, with or without the pitch accent. To me, the tone marks seem the easiest and manageable method to provide the pitch tone. Don't get me wrong, I am not against the idea but it seems like a big job. ...any serious student interested in the pitch accent of a word at least would know the kana. Yes, that's right but we provide Rōmaji as well. The tone marks are written over Rōmaji. --Anatoli 10:06, 26 May 2011 (UTC)
Up until now, I've only made sporadic anonymous edits. Currently I don't plan on making regular edits after this, it was more of an one-off idea for a (semi-)automatic conversion. I estimate this would affect be 1000-2000 entries (since data is ~2000 entries, but some might be missing in English Wiktionary or be difficult to match up). I'm still in the data extracting phase. That said, if this goes well it might inspire me to contribute more. (So far, so fun.)
I guess you mean tone marks like in "Sono musumé o míreba", right? I was envisioning having something like this (is that IPA btw?). Hmm... I guess the formatting could use some work though.
--Vaste 11:50, 26 May 2011 (UTC)
I made a template and two example pages:
How does it look? Vaste 18:13, 6 June 2011 (UTC)

Japanese Pitch Accent formatting

Okay, I tentatively added pitch accent to four entries:

, 思う, 接触, 言葉

It basically looks like this:

  • (Standard Japanese) Pitch Accent: [3]

Other variants currently used can be seen in the linked pages below. None is really widely used (はし-style with ~60 uses is the most common I think).

  • はし line-template + romaji accents
  • 伯爵 IPA (or romaji?) accents, the tone represented is: くしゃく [0]
  • うつ number + japanese term
  • あう number + japanese term

Any feedback? How's the formatting? Again, I expect this to affect up to 1000-2000 entries (hopefully).

Standard Japanese

Standard Japanese refers to 標準語, a.k.a. "Common Language" (共通語) or NHK-Japanese, which is what is spoken by news broadcasters etc. Basically it is the Tokyo dialect, or at least based in it.

Would it perhaps be more appropriate to use "Tokyo Dialect" or similar instead?

(moved here) Vaste 04:17, 9 June 2011 (UTC)

Category:All topics

This category is a bit of a misnomer because it doesn't actually contain all topics, only the top-level ones. I think it would be good go actually add all topics to it for users who want to see a full list. But we shouldn't use it as a top-level category anymore in that case. Perhaps replace it with Category:Top-level topics or something similar? —CodeCat 22:54, 23 May 2011 (UTC)

I support renaming it to Category:Top-level topics or just Category:Topics. --Daniel 23:13, 23 May 2011 (UTC)
I admit that "Category:Top-level topics" is a better name than "Category:All topics", but the best names is "Content by topic", "Content by subject", or the like. Explanation: The ultimate members of the category "Category:All topics" are the mainspace entries rather than the directly included categories, and these entries are neither "All topics" not "Top-level topics". Like, Category:Animals does not need to contain any mainspace entry but rather can contain only subcategories, yet "animals" refers to the ultimate members of the category. --Dan Polansky 14:42, 24 May 2011 (UTC)
Well, by analogy to Animals, then, this should be category:Everything.​—msh210 (talk) 15:02, 24 May 2011 (UTC)


I didn't find a Deletion Request for this article - what are the reasons to delete it? It was deleted at Wikinews following deletion here, however I think that the idea of a bus-based tour would be generally useful to newcomers. Do you think it works at Wikipedia, or at other Wikimedia projects as well? Thank you for your responses - in advance! --Gryllida 04:32, 24 May 2011 (UTC)

You can't tour a website in a bus. How many websites have you ever toured in bus? --Mglovesfun (talk) 10:32, 24 May 2011 (UTC)
Maybe a w:NetBus? :p —CodeCat 10:39, 24 May 2011 (UTC)
Seriously though, if the contributor explained what he/she intends this thing to be, it might be a start. Furthermore I don't think a "our would be generally useful to newcomers". Most of them just want to look up words by typing them in and hitting enter, they don't really care about deletion debates or Category:Fictional cheeses or whatever it was. --Mglovesfun (talk) 10:45, 24 May 2011 (UTC)
Most users these days don't understand the whole Wiki Tour idea as it's pretty old. It used to be a good way to promote other wikis; a sort of wiki webring. But as is evident in the "patent nonsense" deletion rationale at Wikinews, most people these days are clueless about the concept. -- OlEnglish (Talk) 04:03, 25 May 2011 (UTC)
  • Interesting, why was it appearing at Wikinews/Wiktionary just now? Why wasn't it present here earlier? --Gryllida 14:23, 28 May 2011 (UTC)
Nobody had created it. Mglovesfun (talk) 14:31, 28 May 2011 (UTC)

neologisms in the news

I may be the last to know this, but, in case I'm not, there's a site that has some neologisms found in the news: [4].​—msh210 (talk) 16:53, 24 May 2011 (UTC)

Japanese terms by their kanji

I recently created Category:Japanese terms spelled with 一, on the model of older categories like Spanish terms spelled with Ç and English terms spelled with /.

The new category was populated automatically by {{ja-kanjitab}}. I didn't have to edit any entry for that purpose, simply because all Japanese entries written with kanji are already supposed to use that template just to display each individual kanji.

Since there are hundreds of kanji in Japanese, the new category may (or may not, of course) be a precedent for hundreds of equivalent categories of that language. Would people want that? While a consensus isn't achieved for this little (or potentially big) project, I'm just creating a few more Japanese categories for specific kanji like that; there are four of them by now.

By the way, my personal opinion is: Yes, we should have one category like that per kanji, except any category that would have only one member (the kanji itself). At the very least, all common kanji should be categorized that way. --Daniel 01:06, 25 May 2011 (UTC)

I say no, since Kanji are the usual way of spelling Japanese words. By the same logic, you could create Category:English terms spelled with A. -- Prince Kassad 01:21, 25 May 2011 (UTC)
Not exactly. There are much more Japanese kanji than English letters. Apparently there are much more words per character in English than in Japanese. While 一 is an extremely common kanji, Category:Japanese terms spelled with 一 only has 128 members. Conversely, Category:English terms spelled with A naturally would have thousands of members. For comparison, Category:English terms spelled with É has 379 members. --Daniel 01:29, 25 May 2011 (UTC)
This could be useful, and the fact that it would be populated by a template we already have is appealing; minimal work for out contributors to do to make this work. --Mglovesfun (talk) 16:27, 25 May 2011 (UTC)
Special:WantedCategories is now flooded with hundreds of Japanese categories. Are those all going to be created? —CodeCat 13:30, 28 May 2011 (UTC)

Announcing our new community liaison

I’m delighted to announce that the w:Wikimedia Foundation has engaged Maggie Dennis (User:Moonriddengirl on the English Wikipedia and elsewhere) to serve as our first Community Liaison. The Community Liaison role is envisioned to be a rotating assignment, filled by a new Wikimedian each year, half year or quarter. One of Maggie’s responsibilities is to begin to lay out a process for how this rotating posting would work.

Maggie has been a contributor to the projects since 2007 and is an administrator on the English Wikipedia and an OTRS volunteer. She has over 100,000 edits, including edits to 40 of the language versions of our projects. Her broad experience and knowledge made her a natural fit for this role.

This role is a response to requests from community members who have sometimes felt they didn’t know who to ask about something or weren’t sure the right person to go through to bring up a suggestion or issue. Her initial thrust will be to create systems so that every contributor to the projects has a way to reach the Foundation if they wish and to make sure that the Foundation effectively connects the right resources with people who contact us. If you aren’t sure who to call, Maggie will help you. Obviously, most community members will never need this communications channel - they’re happy editing, doing the things that make the projects great - but we want to make it as easy as possible for people to communicate with the Foundation.

The job of the liaison will have two major parts. First are standard duties that every liaison will perform which may include maintaining a FAQ about what each department does, making sure that inquiries from email or mailing lists are brought to the attention of appropriate staff members, etc. However, we also want liaisons to be free to pursue unique projects suited to their particular skill sets. Maggie will develop such projects in the coming weeks.

Maggie will be on the projects as User:Mdennis (WMF) and can be reached at mdennis Her initial appointment runs for six months. I look forward to working with Maggie in this new role!

Philippe (WMF) 22:01, 25 May 2011 (UTC)

Attestation of extinct languages 2 ‎

I have created a follow-up vote: Wiktionary:Votes/pl-2011-05/Attestation of extinct languages 2. The vote differs from the currently failing Wiktionary:Votes/pl-2010-12/Attestation of extinct languages in two ways: (a) it does not include "mention", and (b) it does away with "Extinct languages" section altogether. I have left the date of the start undetermined, to allow for the discussion to proceed as long as needed. --Dan Polansky 13:10, 26 May 2011 (UTC)

On that matter, I have created a competing proposal, Wiktionary:Votes/pl-2011-05/Attestation of extinct languages 3, which would allow exceptions to be added per language. -- Prince Kassad 15:34, 26 May 2011 (UTC)
What if both pass? —CodeCat 15:52, 26 May 2011 (UTC)
They're not mutually exclusive. You can both have a blanket exception for extinct languages, and further exceptions for individual languages. -- Prince Kassad 16:17, 26 May 2011 (UTC)
Please, please don't have them running at the same time; a similar thing happened with place name votes and it created a mess. --Mglovesfun (talk) 12:09, 27 May 2011 (UTC)
I do not see any problems with the two votes running at the same time. But if there are more people who want that the votes do not run at the same time, I am willing to look into how to determine which of the two votes should be postponed. Using the first-come-first-served heuristic, the vote I have created should start first. But I am willing to let us throw dice in some way to determine which of the two votes should run first. Again, this presupposes that more people actually see a problem with the two votes running in parallel; I do not see any such problem. The two votes could have been proposed by the same person: what they propose is complementary rather than mutually exclusive. --Dan Polansky 10:20, 28 May 2011 (UTC)
It's tough when one vote could influence the result of another. Mglovesfun (talk) 11:05, 29 May 2011 (UTC)

Categorizing suffixes by part of speech

In many languages, particularly those that inflect, suffixes always form particular parts of speech. In other words, the POS of a word can be determined by the suffix itself. For example, -able normally derives adjectives, and any words ending in -able that are not adjectives are usually derived from the adjective. For that reason I'd like to propose that we allow suffixes to be categorised by part of speech, and also that we allow new POS headers for that purpose. This means that for every POS we currently allow, we should allow another one with 'suffix' after the name, like 'noun suffix' or 'verb suffix' and so on. This would be very useful for many languages. —CodeCat 10:37, 27 May 2011 (UTC)

This goes on to a point made by EncycloPetey about whether inflectional endings like -s (for example, in English) are suffixes or just inflectional endings. I was also looking at Category:English words suffixed with -eth. If we tend not to categorize entries in Category:English words suffixed with -s when they are just plurals or third-person singulars, why do it with -est and -eth? Category:English words suffixed with -eth is essentially a subcategory of Category:English third-person singular forms.
In answer to your specific question; I'd probably oppose it; explain in the entry that -able is used to form adjectives, but keep the header as simply "suffix". --Mglovesfun (talk) 11:47, 27 May 2011 (UTC)
But this specifically excludes inflectional forms. Those would still use the 'suffix' header or maybe 'ending' instead. I am referring more to suffixes that derive a completely new headword, such as the way -able derives adjectives from other existing words. Often, these entries are treated as the part of speech they create, along with inflection tables and such. For example the Finnish word -htaa is formatted as a verb entry, even though the header is 'suffix'. The POS header just doesn't really describe the entry very well; it says it is a suffix, but not what kind of suffix. It would be like saying 'nominal' and then writing whether it is a noun or adjective in the definition, but putting all nouns and adjectives together in 'English nominals'. —CodeCat 12:01, 27 May 2011 (UTC)
Oh, and also notice that the Finnish verb suffix entry is categorised in a specific category for verb conjugation as well. —CodeCat 12:03, 27 May 2011 (UTC)
In reply to "it says it is a suffix, but not what kind of suffix" I'm saying that sort of information doesn't go in the header, there's a possibly using infl for something like {{infl|fi|suffix|nominal suffix|cat2=nominal suffixes}} but I personally at least wouldn't put that information in the header, as opposed to the head word (the one in bold). --Mglovesfun (talk) 12:06, 27 May 2011 (UTC)
For Proto-Germanic entries I've even gone so far as to just categorise suffixes under the part of speech directly, such as Template:termx. But in any case, the point I'm trying to make is that 'suffix' is not a part of speech, it says nothing about how the suffix behaves grammatically, while 'verb suffix' does. I think even if we don't require it, we should at least allow it. And if we don't allow it as a part of speech header, then at least as a category such as English adjective(-forming) suffixes. It would make the suffixes a lot easier to keep apart, because right now the category is very messy. —CodeCat 12:12, 27 May 2011 (UTC)
I get it, but my point remains the same. Let's let someone else contribute. --Mglovesfun (talk) 12:14, 27 May 2011 (UTC)
I've knocked up User:Mglovesfun/-eur as an example. --Mglovesfun (talk) 12:28, 27 May 2011 (UTC)
Nominal suffix is ambiguous, because it can refer to adjective suffixes as well. —CodeCat 12:31, 27 May 2011 (UTC)
I support having categories for affixes that convert words to a certain part of speech, at least for certain languages and parts of speech. Dunno about the other aspects of the proposal. I mean, is un- an "adjective prefix" and "verb prefix" because it converts adjectives to adjectives and verbs to verbs? —RuakhTALK 13:28, 27 May 2011 (UTC)
un- isn't any part of speech, it's a generic prefix that can be attached to words of any part of speech, and doesn't modify that part of speech. The same applies to the Finnish possessive suffixes like -ni as well. —CodeCat 13:33, 27 May 2011 (UTC)
Re: "it's a generic prefix that can be attached to words of any part of speech": Sorry, but I don't think that's true. From an etymological or semantic perspective, it's two separate prefixes, which attach to different parts of speech. One attaches to adjectives and nouns (and maybe adverbs?), means "not", and produces words with the same part of speech; the other attaches to nouns and verbs, means "reverse", and produces verbs. This creates some fun ambiguities, by the way; see [[unseen]] for a good example. When it's un- + seen it means "not seen" ("unseen dangers lurking around every corner"), but when it's unsee + -en it means "subjected to a reversal of seeing" ("what has been seen, cannot be unseen"). This is because the former uses the adjective-prefix un-, and the latter uses the verb-prefix un-.
Re: "and doesn't modify that part of speech": I'll ignore, for the sake of argument, the fact that (according to our entry) it can make nouns from verbs. Beyond that — nothing in your proposal suggests that these categories would only be for affixes that do modify the part of speech, hence my comment.
RuakhTALK 17:08, 27 May 2011 (UTC)
As far as I can see on the entry, all the example uses of un- there are added to a word with the same part of speech. I don't see any examples where the POS changes. —CodeCat 17:28, 27 May 2011 (UTC)
Indeed, but I assume the entry didn't just make that up . . . —RuakhTALK 18:07, 27 May 2011 (UTC)
= + . I don't expect this to an isolated example. DCDuring TALK 19:44, 27 May 2011 (UTC)
Altering is a present participle, which is syntactically an adjective with some extra spice that regular adjectives don't have. So in that sense the POS doesn't change: the (un)altering measurement. In both cases they modify a noun. —CodeCat 20:17, 27 May 2011 (UTC)
Introducing a set of categories that does not square with our PoS headers seems a poor approach from a user perspective - unless the categories are solely for our own amusement, in which case they should probably be hidden so as not to confuse a user who gets to the bottom of an entry. DCDuring TALK 20:25, 27 May 2011 (UTC)
We already distinguish 'noun form' from 'noun' in categories while both use the POS header 'noun'. If we use the header 'suffix' and categorise as either 'suffix' or 'noun suffix' I don't see how that is any different. —CodeCat 20:26, 27 May 2011 (UTC)
In the case of "unaltering", how would you have categorizers categoprize? What would the documentation say? DCDuring TALK 21:38, 27 May 2011 (UTC)
Unaltering wouldn't be categorised any differently from the way it is now. It's only the categorisation of the suffixes themselves that I am trying to improve. To create subcategories in Category:English suffixes per part of speech. —CodeCat 21:43, 27 May 2011 (UTC)
Which categories would un- belong in? DCDuring TALK 22:02, 27 May 2011 (UTC)
The same as it belongs to now, because it has no inherent part of speech. —CodeCat 22:05, 27 May 2011 (UTC)
-able can be used to form nouns too. Such as expendable. Mglovesfun (talk) 10:23, 28 May 2011 (UTC)
But that is a noun formed from the adjective 'expendable', not from the verb directly. There is a difference between saying that the suffix actually forms the noun, and saying that the noun simply ends with it. Expendable already was an adjective before it became a noun, so the -able suffix didn't form the noun, it formed the adjective. —CodeCat 10:34, 28 May 2011 (UTC)
But, couldn't you say the same for un- in undauntingly? The un- doesn't form an adverb here, it forms an adjective to which -ly is added. Un- doesn't really work for any part of speech, and even if it did, it would be a pretty small number. So why not just categorize in all the relevant categories? So far that would be verbal suffixes and adjectival suffix; it hasn't been shown to have any other function. Mglovesfun (talk) 10:37, 28 May 2011 (UTC)
I'm not asking you to categorise everything. I'm just asking that if a clear categorisation can be made, that it be allowed as well. The top-level category will still exist and it will still contain entries. It would just be easier to find suffixes that form nouns. —CodeCat 11:17, 28 May 2011 (UTC)

A category for 'words suffixed with'

Currently, categories for words containing a certain suffix are put in Category:English suffixes. But as you can see that category is getting very big, and it's hard to find the remaining few categories among them. For that reason, I would like to create a new category named Category:English words by suffix and add all the suffix categories into that. And the same for prefixes, and of course for other languages as well. —CodeCat 11:40, 28 May 2011 (UTC)

Good idea, I think. Something similar would seem appropriate for almost all other instances of overpopulation by subcategories. DCDuring TALK 15:08, 28 May 2011 (UTC)
Support (weakly as opposed to strongly), yeah, makes the category a bit easier to navigate. --Mglovesfun (talk) 11:04, 29 May 2011 (UTC)

Wiktionary:About Kinyarwanda

This isn't really a policy page. Should it be moved to a subpage of Requested entries? -- Prince Kassad 12:21, 29 May 2011 (UTC)

Move to Wiktionary:Requested entries (Kinyarwanda). Mglovesfun (talk) 13:03, 29 May 2011 (UTC)

zh-cn/zh-tw RFD

I have been notified that an effort is underway to remove/replace the zh-cn/zh-tw tags. Could someone kindly point me to the discussion/vote on this? Since I'm one of the few people who actually creates entries that rely on these categories, I thought I should at least be given the opportunity to read and comment on the debate. Thanks. -- A-cai 23:32, 29 May 2011 (UTC)

The debate is currently at the bottom of WT:RFDO. —CodeCat 23:35, 29 May 2011 (UTC)
Thanks. I have posted a comment there. -- A-cai 00:11, 30 May 2011 (UTC)
Regarding "an effort is underway to remove/replace the zh-cn/zh-tw tags" not that I'm aware of, just a discussion about it. Mglovesfun (talk) 12:25, 30 May 2011 (UTC)

Ossetian ӕ

¶ I noticed that just about all of the Ossetian words on this website contain the Latin character æ but not the Cyrillic ӕ. I know that script‐mixing is strongly discouraged, but even the Ossetian version of Wikipedia contains the Latin æ. ¶ Should we move all those entries to ones which are completely Cyrillic? I already did one, but I would like consent before this may continue. --Pilcrow 01:21, 30 May 2011 (UTC)

I say yes. But I think we have people here whose opinions differ. -- Prince Kassad 13:07, 30 May 2011 (UTC)
I give my consent. Just keep the redirects. --Vahag 17:49, 30 May 2011 (UTC)
Ossetian has used the Latin æ since the beginning. The so-called Cyrillic ӕ is a new invention, created maybe a decade ago by an unofficial Anglo-American body called the Unicode Consortium. The UC devised the Cyrillic ӕ to be there in case the Ossetians might want to switch over to a special non-Roman letter. The Ossetians have never accepted the new letter and continue to use the Roman æ as they have always done. Ossetian computers have keyboard drivers that select the Latin æ, not the Cyrillic one. Ossetian spell-checkers are developed around the Latin æ, not the new letter. —Stephen (Talk) 02:39, 31 May 2011 (UTC)
¶ I personally do not have the option of selecting an Ossetian keyboard layöut in my version of Windows®. I had a suspicion that no official Ossetian layöuts exist, much like how there is no official Esperanto keyboard (I could be wrong, of course). ¶ I can re‐move the content of ӕфсӕ back to its original entry, if desired. --Pilcrow 02:59, 31 May 2011 (UTC)
Whatever the hell it means to use the Latin æ before computer typesetting. I don't know how you call an association that has the governments of Indian and Bangladesh as members Anglo-American, but in any case Unicode has always worked hand in hand with the very official International Organization of Standardization. The standards-compliant things here is obviously to use the Cyrillic letter in the Cyrillic text.--Prosfilaes 03:09, 31 May 2011 (UTC)
Each language and national literature is free to choose its own alphabet, orthography, numbering system, and punctuation, including the recently developed codepoints. There is no law, policy, convention, treaty, or agreement that obligates any culture or people to follow the suggestions of the Unicode Consortium. They can do it if they choose to, or they can choose not to. —Stephen (Talk) 06:18, 31 May 2011 (UTC)

poastcatboiler in articles categories

Hi. I just noticed there is something not right in the postcatboiler usage in most (or probably all) categories under Category:Articles by language. Could someone more knowledgeable of the underlying templates adjust please? Thanks, Malafaya 15:37, 30 May 2011 (UTC)

I fixed it now, it was a small mistake I made when I created the new poscatboiler pages. —CodeCat 15:49, 30 May 2011 (UTC)


I would like to post a message to the one who disappeared a few days ago. You choose to protect the page (as a sysop on fr: I can understand why), I know here is not the best page to do that. But in fact I think the page where I would post that is not important, the only important thing is to tell this man I (and many others) love him. --ArséniureDeGallium 20:03, 30 May 2011 (UTC) My english is very bad when speaking of sentiments. Sorry.Please move this message to the good page.

Robert Ullmann I take it? I think he died a while ago just we had no way of knowing. I'd like to say that he brought a lot of good things here, not least AutoFormat and Interwicket, and he created some cleanup lists for me. So thank you to him. Mglovesfun (talk) 23:42, 30 May 2011 (UTC)


I sometimes scan old books in the vague hope that someone will find the time to proofread them and make use of them. Back in 2004 I scanned a two-volume (470 + 485 pages) Swedish-to-Latin translation dictionary, printed in 1875, with the Swedish in blackletter (Frakturstil). Much to my surprise, one person has proofread the entire text. Here is a sample page, where you see the scanned image and you can scroll down to the proofread text. It's complete with citations in literature (C. = Marcus Tullius Cicero) of the Latin words and phrases. Here's the entry for bookworm (from that same page):

a. som lefwer helt och hållet i böcker, bokmal: (who lives entirely in books)
qui libris heluatur (C. de Fin. III. § 7; *heluo librorum); homo legendi avidissimus, studiosissimus, qui inexhausta aviditate legendi tenetur, qui non potest satiari legendo (C. l. c.).
b. (wanl.) som wurmar för att köpa och ega böcker: (who desires to buy and own books)
insanus (C. in Verr. IV. 1) librorum coemptor, conquisitor.
studium, insania librorum coĕmendorum.

Now, how on earth can we make this investment useful within Wiktionary? I looked briefly at the Latin Wiktionary, but it seems very small (8,000 entries) and not very active. I personally don't speak Latin, but I could help to convert the 1870s Swedish to modern orthography. --LA2 22:35, 30 May 2011 (UTC)

I think it would be better to ask at the Latin and Swedish Wiktionaries... —CodeCat 23:38, 30 May 2011 (UTC)
We might be able to extract word lists at least, if anyone is interested that is. Nadando 00:25, 31 May 2011 (UTC)
But the spellings of the words are very old, so we couldn't really do much more than provide obsolete spellings. Bokwurm is now spelled bokvurm (why not bokorm?) and lefwer is now lever. —CodeCat 00:36, 31 May 2011 (UTC)
We could also extract all the literary citations, but here you find Cicero's use of "heluo librorum" under the Swedish "bokwurm" headword, and Wiktionary (in any langauge) would stick it under the Latin headword.
(vurm is actually a Swedish word (loan from German) for desire or passion (or a passionate person), rather than worm. The animal is known as bokmal.) --LA2 07:53, 31 May 2011 (UTC)

Straw poll: Topical category languages

This is a straw poll per discussion from this vote. Indicate support for the option or options which you would endorse. Since this is run by approval voting, it is automatically assumed that all other options are opposed. DAVilla 20:28, 31 May 2011 (UTC)

Category:de:Mountains and Category:de:Physics

  1.   Support Category names which include the language name can be confusing in certain cases. -- Prince Kassad 21:10, 31 May 2011 (UTC) (addendum: since that doesn't seem to be clear, I'm also explicitly opposing the last option.)
  2.   Support. Only the ISO 639-3 codes are unambiguous. Category:toi:Mountains is clear; Category:Mountains in Tonga would seem to be about something completely different. —Angr 21:31, 31 May 2011 (UTC)
    Are you saying that there is both a place and a language called Tonga? —CodeCat 21:35, 31 May 2011 (UTC)
    See Tonga and ethnologue:toi -- Prince Kassad 21:39, 31 May 2011 (UTC)
  3.   Support --Vahag 22:03, 31 May 2011 (UTC)
  4.   Support. --Yair rand 22:44, 31 May 2011 (UTC)
  5.   Support —Stephen (Talk) 02:36, 1 June 2011 (UTC)
  6.   Support. Cleanest option. bd2412 T 20:34, 2 June 2011 (UTC)
  7.   Support Dan Polansky 09:15, 3 June 2011 (UTC)
  8.   Support . Short and simple. --Makaokalani 13:24, 3 June 2011 (UTC)

Category:German: Mountains and Category:German: Physics

  1.   Weak support.RuakhTALK 22:34, 31 May 2011 (UTC)
  2.   Support. Anything with a prefix that can be removed so that the readers don't have to see it is fine. --Yair rand 22:44, 31 May 2011 (UTC)
  3.   Support Dan Polansky 09:15, 3 June 2011 (UTC)

Category:German – Mountains and Category:German – Physics

  1.   Support.RuakhTALK 22:34, 31 May 2011 (UTC)
  2.   Support Dan Polansky 09:15, 3 June 2011 (UTC)

Category:Mountains (German) and Category:Physics (German)

  1.   Support DAVilla 20:37, 31 May 2011 (UTC)
  2.   SupportCodeCat 20:48, 31 May 2011 (UTC)
  3.   Support.RuakhTALK 22:34, 31 May 2011 (UTC)
  4.   Support Dan Polansky 09:15, 3 June 2011 (UTC)

Category:Mountains in German and Category:Physics in German

  1.   Support DAVilla 20:37, 31 May 2011 (UTC)
      SupportCodeCat 20:48, 31 May 2011 (UTC) Retracted my vote because of the ambiguity explained above. —CodeCat 21:45, 31 May 2011 (UTC)
  2.   Support Dan Polansky 09:15, 3 June 2011 (UTC)

Category:German terms relating to mountains and Category:German terms relating to physics

Note: this option was added later than the others, so users who support the others are not necessarily implicitly opposing this one. —RuakhTALK 22:34, 31 May 2011 (UTC)

  1.   Support. DAVilla 18:56, 4 June 2011 (UTC)
  2.   Support.RuakhTALK 22:34, 31 May 2011 (UTC)
  3.   Support but a shorter name would be nice. I just don't know how to shorten it. —CodeCat 22:37, 31 May 2011 (UTC)
    I agree. Even worse is the presumptive umbrella category, Category:Terms relating to mountains by language! —RuakhTALK 22:40, 31 May 2011 (UTC)
    It could be category:Terms, by language, relating to mountains, which is IMO clearer.​—msh210 (talk) 18:08, 1 June 2011 (UTC)
  4.   Support --Daniel 05:51, 1 June 2011 (UTC)
    These names are written in plain English, so their exact meanings are easier to learn. I would also support the additional proposal of shortening these names while keeping them in plain English, if possible. Well, Category:German terms involving mountains would be one word less, at least.
    I'm certainly not saying that "Category:Mountains in German", "Category:Mountains (German)" or "Category:German: Mountains" are hard to read or understand. They're very easy too. However, their exact meaning may be a source of dispute. I've seen various disputes about this, and I'd like very much to avoid them.
    If "Category:Mountains (English)" is about the topic "Mountains", then it should contain montane, mountaintop, summit, summiteer, mountainous and arguably faith will move mountains, in addition to containing all names of mountains.
    If, on the other hand "Category:Mountains (English)" should contain only names of mountains (e.g. K2, Everest), then it should not contain the aforementioned terms.
    Here are some current categories and their members, for comparison of that difference of scope:
    --Daniel 05:51, 1 June 2011 (UTC)
    If this option passes, we can discuss splitting up the categories into those with names like Category:German terms relating to mountains and those with names like Category:German names of mountains.​—msh210 (talk) 18:08, 1 June 2011 (UTC)
  5.   Support. The other options are just too hard to read and understand, especially if English and/or German is not your first language(s). I'd like to keep "relating to" instead of "involving" though. Facts707 05:55, 1 June 2011 (UTC)
  6.   Support. SemperBlotto 07:18, 1 June 2011 (UTC)
  7.   Support.​—msh210 (talk) 18:08, 1 June 2011 (UTC)
  8.   Support Bequw τ 21:10, 1 June 2011 (UTC)
  9.   Support, as I do not particularly oppose this option. How about "regarding" instead of "relating to" or "involving"? bd2412 T 20:36, 2 June 2011 (UTC)
      Oppose Dan Polansky 09:15, 3 June 2011 (UTC) I have stated an explicit oppose to fit an explanation here; this oppose is the same as no comment at all per the voting scheme of the poll; this oppose is indented to make counting of supports easier. Categories that are named in plural often are not intended to contain all sorts of terms relating to the things named by the plural. Category en:Mammals does not contain all sorts of terms relating to mammals ("mammary gland") but only mammals, so the category is a hyponymic one. Category names can be short at the cost of ambiguity, as we have a category description in the boilerplate shown at the top of the category for those users browsing the topical categories for whom "Mountains (German)" does not provide enough detail. I admit that the boilerplate for "template:topic cat" now defaults to something like "The following is a list of terms related to HEADWORD", but that is suboptimal, and can be replaced whenever someone creates a non-default description for the particular category. If the intention of this option is to so change the scope of each topical category that it is intended to contain all sorts of words relating to its headword, then the effect is not a mere renaming but rather a vast change in the scoping strategy for topical categories. --Dan Polansky 09:15, 3 June 2011 (UTC)
    There are some extended possibilities of names of categories: msh210 mentioned "Category:German terms relating to mountains" and "Category:German names of mountains", and Ruakh mentioned at this vote "Category:English terms relating to trees" and "Category:English terms for types of tree". I don't think this option involves changing the scope of all hyponymic categories to become "topical" categories, although I acknowledge that the possibility of having "extended possibilities of names of categories" is not being expressly voted here.
    I, specifically, approve the idea of using category names written in plain English, as opposed to parentheses, colons and language codes, to clarify the scope of each category, while retaining their scopes. (As an exception, I naturally don't oppose the creation of discussions to request changes of scope of particular categories, but I don't think this is one of them.) --Daniel 09:36, 3 June 2011 (UTC)
    I think I understand. What you are saying is that I can read this option as one that proposes the use of long names without comitting them to the form of "X terms relating to Y". I am unenthusiastic about such long names, but I am not going to oppose them in a formal vote as long as they do not mislead about the scopes of categories. The particular form "X terms relating to Y" does mislead about the scopes when applied to "Mountains", "Mammals", and similar. --Dan Polansky 10:26, 3 June 2011 (UTC)
    It seems to me that there are two separate inquiries involved here. One is whether to use, for example, "de:" or "German", and the other is whether to say "Mountains" or something more clearly explaninhg the scope of the category. There is no reason we could not have a scheme of Category:de:Terms relating to mountains or Category:de:Mountain terms or Category:de:Mountain terminology. bd2412 T 14:44, 3 June 2011 (UTC)


I like the idea of using language names - people other than linguistic students, professional linguists and Wiktionary users are almost never gonna understand hu, he, fy, nds etc. I'm no overly bothered which one wins out. My marginal favorite is Category:Physics (German), I do like a bit o brevity though, Category:German terms relating to physics seems a bit long to me, but if other editors like it, great! --Mglovesfun (talk) 12:27, 1 June 2011 (UTC)