Wiktionary:Beer parlour/2007/September

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


"Third-person" vs. "Third person"

Perhaps this is a silly question, but things like Category:Third person singular forms should be Category:Third-person singular forms (and subcategories, and similar categories for other people), shouldn't they? Our articles indicate third-person is the adjective form, while the third person is the noun, which makes sense, so the hyphenated form makes more sense in this context, no? Dmcdevit·t 06:05, 1 September 2007 (UTC)[reply]

I agree with the opinion that's probably intended, which is that the form "third-person" is for attributive use and the form "third person" is for other uses. (I don't think it's ever truly an adjective, and if it were, I think I'd apply the same difference-splitting rule.) So, I'm all for moving them. —RuakhTALK 07:07, 1 September 2007 (UTC)[reply]
At the same time, I would also like to move Category:English third person singular forms to Category:Third-person singular forms (English) (or possibly something longer). When we're finally able to derive language from the language code, that's the format that I believe would be most supported. Stephen doesn't like the two- and three-letter codes because they're easy to mix up, and putting the language at front is ambiguous. Opinions? DAVilla 19:38, 1 September 2007 (UTC)[reply]
Why does putting the language at the front look ambiguous? I think the parenthetical just looks bad, and even "in English" at the end would be better. Also I do very much prefer putting the tense in the category, too. Take a look at some of the categories {{es-verbform}} spits out at User talk:Dmcdevit/Test. Dmcdevit·t 20:37, 1 September 2007 (UTC)[reply]
Variations such as "in English" are completely acceptable. DAVilla 04:40, 2 September 2007 (UTC)[reply]
Since the basic level of subdivision on Wiktionary is the language, it makes most sense to have "English" or "en:" at the front of any category specific to English. --EncycloPetey 01:45, 2 September 2007 (UTC)[reply]
To address both of you, of course it isn't a problem for parts of speech, but it doesn't generalize to the topical categories, where "English given names" can mean the given names used in England, and "Russian mountatins" are the mountains of Russia, not the names of all mountains in Russian. Do "English:Given names" and "Russian:Mountains" stand out that much to you? It would be much clearer to say both "of Russia" and "in Russian". The two- and three-letter codes are not ambiguous because they associate with language specifically, but these are somewhat obscure. If that standard ever changes, it would be best to have the POS categories already match it. DAVilla 04:40, 2 September 2007 (UTC)[reply]
We've setled that issue, please don't exhume its corpse again. --EncycloPetey 04:53, 2 September 2007 (UTC)[reply]
I'm not suggesting that we change the topical categories. I'm saying that we should have a naming standard for parts of speech that is consistent with one for topical categories if the latter should ever change. DAVilla 05:04, 2 September 2007 (UTC)[reply]
It's a different topic entirely, though. If we ever change the names in the future, we will change all the categories that need changing at once, but there is no need to think about that here where we are just hyphenating the titles. Dmcdevit·t 05:19, 2 September 2007 (UTC)[reply]
It's not just about the hyphenation. The form-of templates themselves are being completely rewritten, as discussed above. DAVilla 05:24, 2 September 2007 (UTC)[reply]

In any case, this is the list of categories that need to be changed:

Could it be done with some sort of a bot? Dmcdevit·t 05:19, 2 September 2007 (UTC)[reply]

The categories are correct the way they are. Try searching Google Books for "third person singular". The form without the hyphen is overwhelmingly preferred. --Ptcamn 09:41, 3 September 2007 (UTC)[reply]
Not a fair statement, since "third person singular" is a noun phrase in its own right. The issue here is whether that should be hyphenated when it's used to modify another noun. --EncycloPetey 00:21, 5 September 2007 (UTC)[reply]
Surely that would be "third-person-singular" then?
But anyway, my point still stands:
  • 1994: Merriam-Webster's Dictionary of English Usage
    Although the lack of a common-gender third person singular pronoun has received much attention in recent years from those concerned with women's issues, [...]
  • 1999: Mary E. Coffman Crocker, Schaum's Outline of French Grammar
    Some verbs use the third person singular form of the present tense rather than the infinitive as the future stem.
  • 2001: Ludovica Serratrice, "The emergence of verbal morphology and the lead-lag pattern issue in bilingual acquisition", in Trends in Bilingual Acquisition
    Although at 2;3.17, 2;3.7, and 2;4.14 the third person singular present indicative inflection is used productively, it is not until 2;5.6 that the first contrasts emerge.
  • 2002: Joanne Scheibman, Point of View and Grammar: Structural patterns of subjectivity in American English conversation
    Third person singular subjects by tense and verb type
  • 2002: Lieselotte Anderwald, Negation in Non-Standard British English: Gaps, Regularizations, and Asymmetries
    It is also interesting to note that although there are areas where in't does not occur with a non-third person singular pronoun, although it occurs with a third person singular pronoun, the reverse is never the case.
--Ptcamn 09:28, 5 September 2007 (UTC)[reply]

sum of parts

If a phrase's meaning could be derived from the meanings of its components except that the phrase is used in a more general context than one or more of its parts, it is considered idiomatic. Examples: one part is obsolete whereas the phrase is merely dated; one part is dated whereas the phrase is not; one part is used only in a mathematics context whereas the phrase is not so restricted; one part is marked (chiefly British) whereas the phrase is not so marked.

What do you all think of the above passage (which is not taken from anywhere; I just wrote it)?—msh210 06:36, 2 September 2007 (UTC)[reply]

(The other way around, where neither part is restricted but the whole is used in mathematical contexts, would already pass.) I like it, although I would add that it has to be clearly so, since we don't have criteria for obsolete and dated. Maybe archaic and dated, or obsolete and contemporary, would make a better case. Do you have any concrete examples? Ultimately it would depend on whether the community found those acceptable for your reasons. DAVilla 12:35, 3 September 2007 (UTC)[reply]
Concrete examples:
msh210 18:14, 5 September 2007 (UTC)[reply]
I find it difficult to read meaning from the passage as written. --EncycloPetey 00:19, 5 September 2007 (UTC)[reply]
See examples above.—msh210 18:14, 5 September 2007 (UTC)[reply]
FWIW, I don't recall ever having seen the "P" capitalized, before now, in pleased as punch. --Connel MacKenzie 19:00, 5 September 2007 (UTC)[reply]
I guess I'm not seeing the purpose behindthe question. You've asked "What do you think?" without any context. It may be true that entries meeting the stated condition are idiomatic, but so what? And it doesn;t mean that all idiomatic forms will meet the condition either. --EncycloPetey 03:49, 6 September 2007 (UTC)[reply]
The "What do you think?" meant two things. One, do you think that this accurately represents what is obviously part of the community's opinion on set phrases already? Two, if it's not obviously part of the community's opinion, then is it part of your opinion?—msh210 17:03, 6 September 2007 (UTC)[reply]
I'm not sure. If a phrase would be completely sum-of-parts to someone who knew what all the parts meant, and the only issue is that one part's rarity makes it likely that someone wouldn't know what it meant, well then, it seems like having an entry for that part should suffice: you look up the part and the phrase makes sense. On the other hand, if you look up the part and see it marked (dated), you would naturally assume that the book uses dated terms, even if that's not actually true and the book is simply using a stock expression that contains an otherwise-dated term. —RuakhTALK 04:12, 6 September 2007 (UTC)[reply]
Could that not be explained in usage note at the said dated term’s entry? –Something like: “Usage of this word by itself has declined considerably since [DATE]; however, it still sees frequent use in the set phrase ossified phrase.”? † Raifʻhār Doremítzwr 11:30, 6 September 2007 (UTC)[reply]

Circular definitions

Do you know of any good way to mark circular definitions, or pairs of definitions relying on each other? See for example walkway and passageway. Circeus 21:22, 2 September 2007 (UTC)[reply]

We don't have that as of yet. The best thing that I've found is to try and rewrite the circular definitions so that one of them has a distinct definition of it's own. If you find any that you don't feel comfortable rewriting, feel free to leave a note on my talk page and I'll try and tackle it. --Neskaya talk 23:41, 2 September 2007 (UTC)[reply]
Or bring it up in the tearoom. If it were completely circular that's a more serious flaw, but this one wasn't too bad. You have to consider that people might know what one is and not the other, and by saying "X or Y-Z" you'd be helping the X crowd who find the Y-Z explanation cumbersome. DAVilla 12:40, 3 September 2007 (UTC)[reply]
Every single one of our definitions uses words that are defined elsewhere using different words that are defined elsewhere . . . . . In the end, ALL of our definitions are circular, it's just that some circles are bigger than others. SemperBlotto 13:29, 3 September 2007 (UTC)[reply]
Well, that's true to an extent: what you tend to have in dictionaries is term A is defined using term B, which is defined using term C, etc, and by the time you get to terms X, Y and Z (say) the terms are so basic that it definining Z inevitably uses X and Y. I'm talking about words like of and the here (the latter being traditionally defined as "the definite article", and so defined in terms of itself). This cannot be avoided without recourse to a metalanguage, which English does not have. However, that is no reason for words near the beginning of the chain to be part of loops; no word should be defined as part of a chain of synonyms that ends up being a loop (A means B, which means C, which means D, which means A).
As for tracking these down, this would require a sophisticated bot or other software, and would be a mammoth task even then (in algorithmic terms, multiple traversals of an enormous graph). — Paul G 16:17, 3 September 2007 (UTC)[reply]
I was thinking of a template or category that could be used to track them, but maybe it's just me. Circeus 16:44, 3 September 2007 (UTC)[reply]
What we need is something like the old "Oracle of Bacon" that counted the "distance" between a celebrity and Kevin Bacon via movie connections. We could do the same thing with two entries in the same language, but looking at the direction both ways. If the directed distance between two articles is 1 in both directions, then it needs a serious looking at. Of course, such a tool would be complicated by the synonyms sections, where we want that kind of "circular" linking. --EncycloPetey 00:18, 5 September 2007 (UTC)[reply]
There are (or at least were) at least two "six degrees of Wikipedia" tools that find the distance between Wikipedia articles (see w:Wikipedia:Six degrees of Wikipedia#External links). I imagine that one of these could be modified to work with Wiktionary. It could (probably) be limited to definitions by only reading lines starting with #. A list that output a list of articles that were less than say 3 degrees apart would help find circular definitions - there would be false positives though, e.g. "A meaning B; XYZ" and "B meaning A; XYZ". Thryduulf 11:24, 5 September 2007 (UTC)[reply]
Another thought on this is that perhaps it could list the longest and shortest chains between the definition. IIRC the tools give up finding chains after following 10 links (to avoid situations where A is defined as B, B is defined as C, and C is defined as B) but if it is that long the chances are it wont be a problem with circular definitions. Thryduulf 02:35, 6 September 2007 (UTC)[reply]

Template:blend

What's going on with Template:blend? This no longer seems to add terms to their correct alphabetical position in Category:Portmanteaus but bundles most terms in an unordered list at the end. — Paul G 16:08, 3 September 2007 (UTC)[reply]

Now fixed, though it might take a while to clear the edit queue. (It was sorting entries by {{lcfirst:{{{PAGENAME}}}}}, which is simply {{{PAGENAME}}}. Now it will sort entries by {{lcfirst:{{PAGENAME}}}}, which is whatever the name of the entry is, except with a lowercase letter first.) —RuakhTALK 16:47, 3 September 2007 (UTC)[reply]
Thank you. Muppet and Socceroo (which both have an initial capital) are still coming out under "M" and "S" rather than "m" and "s" respectively. — Paul G 16:10, 6 September 2007 (UTC)[reply]
Muppet isn't using {{blend}}; it is directly included in Category:Portmanteaus. Socceroo was missorted because it was explicitly included in Category:Portmanteaus with no sort key, overriding the inclusion done by {{blend}}. I've removed [[Category:Portmanteaus]] from Socceroo and given Muppet a sort key manually. Mike Dillon 16:17, 6 September 2007 (UTC)[reply]

Blocks too excessive

Blocks are a plenty here, they are excessive in length compared to similar situations on other foundation wikis, and another difference is that they (usually from what I've seen) come without any warning. This isn't very encouraging to the "openness" that this wiki and the foundation is built upon, and discourages new users who are testing something out from ever becoming a valuable member of this community.

But I couldn't say it better than Anthere (who, I have just learned is the chair of the WM Board of Trustees), who said the following (which I am copy/pasting from User talk:Cynewulf#Excessive block):

I personally find a block of one week for a rather stupid testing, with NO previous warning comment on the talk page of the person, very much against our spirit of openness. Yes, a block wears off, yes, they will live, but their experience with Wiktionary will be unpleasant. Imagine that the editor was a young guy, maybe just playing/experimenting in a rather stupid way. What will he do afterwards ? I am 100% sure that when he is unblocked, if still interested, he will transform himself in a bad editor, a troll, a bugger, a generally nasty person. Whereas, if you just let him a small gentle comment, saying "please, do not do that, you like the project ? Do not damage it on purpose", I am pretty sure there is a significant chance that the editor might turn into a good guy. Never neglect the force of being nice to people. Remember wikilove.

I really didn't mean to come here to be a pain in the ass, I just stopped by, clicked the recent changes, and saw what in my opinion were blocks that didn't need to be done, where a simple "hello, welcome, hope you can contribute" would have a greater effect than blocks.

As for "Stupidity" in the block reason list, well, I hope you'll change that too. ZJH 18:38, 4 September 2007 (UTC)[reply]

FWIW I don't see the need for "general stupidity" there either -- everything I come across is either "personal attack" or otherwise talking about specific people, "nonsense/spam", or "deleting info".
For block lengths, what do we want to accomplish? Make everybody happy? I'd be happy if vandalism stopped. Do we want to change people? I don't know how to do that. Do we want to have an effect on people? A fifteen minute block isn't going to have any effect, and neither is an infinite one. A three-day block will hurt an addict, but what about somebody who doesn't care? Should we be thinking about this in terms of hurting people? (probably not) Will {{test}} on their talk page suddenly make people nice? Call me a pessimist, but I doubt it -- meanwhile we'll have more stuff from them to clean up. Bah, all this hullaballoo makes me wonder why I bother with vandal patrol. Every time I revert "poop" out of something somebody gets on my case. Well, enjoy your poop. Cynewulf 19:00, 4 September 2007 (UTC)[reply]
Case study: [1] (warning, obscene content). I haven't blocked this one yet. What action would create "significant chance that the editor might turn into a good guy"? To me, this looks like a 13-year-old kid going to the page of one of his "friend"'s names and talking trash. The IP belongs to a K-12 school in Florida. We can wait until he grows up, right? Cynewulf 19:07, 4 September 2007 (UTC)[reply]
I tend to lean toward warnings and assuming good faith, but in this case, I support Cynewulf's choice to block the anonymous editor for a week. He or she was clearly not trying to help this project and a one-week block is an effective way to minimize subsequent garbage. Several factors make vandal patrol here more challenging than elsewhere (e.g., Wiktionary has orders of magnitude more entries than other WikiMedia projects and its entries are in hundreds of languages, which complicates distinguishing vandalism from testing from useful edits). Rod (A. Smith) 19:21, 4 September 2007 (UTC)[reply]
  • There is a significant principle that each wiki project is independent. Wikipedia administrators are not given administrator privileges here, for many reasons. The culture conducive to building a usable dictionary is quite different from the "culture" over on Wikipedia. From Wiktionary's inception, it has had a much higher per-user vandalism level than any other WMF project. In theory, a dictionary definition, being brief, is a perfect outlet for a short "GOSH IS GAY" type entry. Wikipedia's permissiveness may be changing that by attracting more vandalism onto itself. Their "culture" is so troll-friendly now, little productive encyclopedic work is done anymore. What is it these days; 60% of Wikipedia traffic is entertainment related? The hard stance that Wiktionary has had (since long before I was an administrator here,) has helped to nip that in the bud. Yes, we get "normal" vandalism levels now, but I think Anthere has an enormous hill to climb with her argument, that over-permissiveness would somehow be beneficial to this project. Her disregard for the separate cultures of each member project is very disconcerting...it is as if she thinks all projects are Wikipedia? --Connel MacKenzie 19:27, 4 September 2007 (UTC)[reply]
    I don't know anything about wikipedia's vandalism level, but their administrator level is 1309, compared to 52 total here (and only what, three who do large amounts of vandal patrol?), and 55 total (27 active) on wikinews (whence ZJH). We don't have the manpower for much red tape. Cynewulf 19:38, 4 September 2007 (UTC)[reply]
    No, I do not think that all projects are Wikipedia. And I also know that even on one projects, all languages does not share the same needs, habits and rules. I still think that openness is one of the few values we have in common. Correct me if I am wrong on this. The day Wiktionary decides to restrict editing to those only between 25 to 35 and under real name, then all projects will have to discuss if we want to stick together or travel each in its direction. I also think that attitude such as trust and discussion are also one of the few values we have in common. Again, if Wiktionary decides one day that only those with a university diploma and having shown their identity card are allowed to edit, then again, we'll have to discuss whether Wiktionary wants to go its own way. Last, I think that we share as a value the fact that the entire community is allowed to participate in the discussion over rules. Seeing Cynewulf comment, I am suddenly wondering if rules are open to discussion with the community or only restricted to administrators input. Do I exagerate ? Perhaps a bit. But the issue is not about disregarding separate cultures of each project. The issue is that we share some common values and each of us implement these values the way we see best. So, *my* comment will be that I do not find a good idea to block someone for one week, for the first offense, without at least one warning. But I can live with that. What I find much more problematic is that when someone makes a comment to express a disagreement, he is answered "Interfering with administrators carrying out vandalism cleanup tasks is not recommended". Again, I may be wrong, but I like to think that on our projects (all of them), administrators are not using intimidation as part of their tactics to scare away newbies. Anthere
Anthere, I love you, what you do for WMF and your crazy idealism, but we aren't talking about a potential good contributor here; we're talking about a known-bad contributor, and a subsequent (unwarranted) accusation from User:ZJH. User:Cynewulf suggested the specific complaint be redirected from his talk page, to here, the centralized discussion area. While his tone may have been far from perfect, I think it is unfair to harp on one sentence of it. Better, instead perhaps, to harp on the ridiculous tone that started it, from User:ZJH. Would I have immediately gotten defensive in the same situation? Wouldn't you?
The Wiktionary WT:VOTE process is very much open to all Wiktionary contributors, (even if you would block me based on my age.) I think you have overstated your assumptions about some perceived lack of openness. --Connel MacKenzie 22:14, 4 September 2007 (UTC)[reply]
Nothing ridiculous about the tone I used when suggesting the block be lowered, which was wholly based on my opinion (ahh, linking words for people when they have no idea what they mean, that's what I like about Wiktionary ;) BTW, I suggest you look up the definition of community as well. ZJH 14:55, 5 September 2007 (UTC)[reply]
Your misplaced, defensive, sarcastic insults do not help here. Note that you are now on the defensive; lashing out. A very human reaction, don't you think? I said tone...since you obviously don't know what I mean, I'll be clear: the manner in which speech or writing is expressed. You initiated an adverse conversation with an accusation...sorry, but that is pretty dumb. You set the tone for the conversation by doing so. To pretend that you maintained some semblance of civility is nonsense. Couching your statements in weasel words does not change the tone; what you said was still an accusation. Soon after, you called in the posse to bail you out, once you were in over your head. Do you consider yourself part of this community? You seem to have an enormous amount of unjustified bravado. Offhand, I'd guess you are being disruptive. Upping the ante all the way to the board? For a justified, reasonable one-week block? Get a grip! --Connel MacKenzie 18:50, 5 September 2007 (UTC)[reply]

I think admins here would be more willing to issue warnings rather than blocks if there were a good way to keep an eye on a user. Currently, if I don't block an editor for vandalism that another admin probably would block him/her for, I feel somewhat responsible for any later vandalism by that editor (since I could have prevented it by blocking). If I had some sort of user-watchlist that let me watch contributions by editors I've sent warnings to, I'd be less concerned: I visit the site frequently enough that I'd see fairly quickly if a given non-block needed to be turned into a block. (I imagine that other admins would at least try out this possibility, and if you're right that a short note can end vandalism positively, then we'd get to see it for ourselves, and our blocking policies would lighten up quickly. Of course, if you're wrong, it probably wouldn't end up changing anything at all.) —RuakhTALK 20:33, 4 September 2007 (UTC)[reply]

Actually, I like Ruakh's suggestion about monitoring specific contributors, but for a different reason. In my case, I am more focused on my area of expertise, which is Asian languages in general (Mandarin in particular). I know of several users that add new Chinese words on occasion, but usually get the formatting wrong (formatting issues can be very confusing to an inexperienced contributor). If I could put something in my watch list that would allow me to know when that person made a new Chinese entry, I could then verify that the entry is properly formatted, plus I could make sure that I agree with the English definition. Currently, the only way to do this is to either frequently click on one of the recent changes links (wading through a bunch of non-Chinese edits), or click on the specific contributor's user contributions link. Neither of these two is a very elegant solution. -- A-cai 21:32, 4 September 2007 (UTC)[reply]
I can envision a toolserver task that floods stalker's watchlists with the stalkee's edits. It might meet some objections, though. --Connel MacKenzie 22:14, 4 September 2007 (UTC)[reply]
What would be better perhaps is a bot that sat on (say) the rc IRC channel and provided rss feeds on a per language basis. I have a bot watching that I later filter for language entries, but the filtering is done on my box at home, not on the fly by the bot. (Although I have also wished for an rc user-specific feed.) ArielGlenn 22:46, 4 September 2007 (UTC)[reply]
I'm not sure I'd like that solution, because then pages would suddenly appear on my watchlist without my knowing why. (I don't remember the username of every person I warn, and anyway, by the time I visit my watchlist the page might have been re-edited by a different user.) It would be better than nothing, though. (It might be good to restrict it to administrators, so as to ensure that no one creates an automated account that, say, reverts all of a certain user's edits. Though I guess someone could already do that by tracking the user's contributions page … has no one done this before? It seems like a great way to really piss off a user; you'd think some troll would have done it by now.) —RuakhTALK 00:53, 5 September 2007 (UTC)[reply]
FWIW, I agree broadly with Anthere's comments, and think that for various reasons our community here has become excessively closed and unwelcoming. This is not limited to blocks, but affects almost all aspects of the project. But I think the pragmatic concerns raised above deserve serious consideration as well. In understanding the differences between Wikipedia and Wiktionary culture, a look at the respective Special:Statistics pages is instructive (well, I found it instructive, anyway). Consider that on ENWP, there are roughly 1520 pages per administrator (and this even with ENWP's astronomically high standards for RfA); even given that many admin accounts like mine are inactive, this allows for a fairly high level of personal monitoring and attention. In contrast, here on ENWT we have roughly 12,500 pages per administrator; even if we were all highly active, monitoring that many pages would be a serious challenge. Now, to some extent this is offset by the fact that the overall rate of editing is fairly low, so that most changes can be patrolled through RC... but even so, once problematic edits have slipped through, they are likely to sit unnoticed for a very long time. This raises the stakes in vandal-fighting, and certainly contributes to the community's current crustiness. But it also raises the stakes in editor education and socialization, which is where I think we really need to focus. -- Visviva 03:13, 5 September 2007 (UTC)[reply]
...which means we could use more admins, which means that we have to be more open to newcomers, and even make some personal compromises on users who clearly wish to benefit the project, though their views may differ from our own. How many users like Thecurran have stumbled into an old debate they were completely unaware of? How many well-intentioned users would be put off entirely by a block, or even by an unexplained revert to an early edit, for but one minor flaw? DAVilla 03:52, 5 September 2007 (UTC)[reply]
Out of curiosity, what percentage of blocks are given to users with accounts, as opposed to anonymous users? --EncycloPetey 04:39, 5 September 2007 (UTC)[reply]
Sry, I've no idea how that could even be measured. Do you mean short-term username blocks (as opposed to infamous user's sockpuppets?) --Connel MacKenzie 05:01, 5 September 2007 (UTC)[reply]
Running some quick greps over the last 5,000 blocks and unblocks (which go back to mid-March) … it looks like 4,212 (84.24%) are blocks of anons, 35 (0.7%) are unblocks of anons, 704 (14.08%) are blocks of accounts, 37 (0.74%) are unblocks of accounts, and 12 (0.24%) are unblocks of numbers. (I don't know what that last one means, but there are 12 log entries of the form 01:30, 29 August 2007 Connel MacKenzie (talkcontribs) unblocked #22584 at various time and with various numbers, and one with Versageek instead of Connel. Note that there are no instances of anyone blocking a number.) —RuakhTALK 05:45, 5 September 2007 (UTC)[reply]
Those numbers are autoblock IDs. The software logs autoblocks publicly so they can be unblocked, but obviously conceals the user's IP with a unique ID. Dmcdevit·t 06:15, 5 September 2007 (UTC)[reply]
(re: Ruakh's stats) - Those numbers are skewed a bit by ~220 sleeper accounts belonging to one of our long-term vandals. I re-blocked these in early April. They had been blocked before for a time period which was less than "indef" and he had started reusing them for page-move vandalism. --Versageek 06:28, 5 September 2007 (UTC)[reply]
Of the username blocks, any way of telling how many were of persistent sockpuppeteers (i.e. indef blocks)? 10% still seems far too high. --Connel MacKenzie 06:38, 5 September 2007 (UTC)[reply]

I think it is fair enough that admins sometimes get smacked for blocking out of hand. Cynewulf was right to block the user in question, but I agree that a week was quite long for a first offence and the problem really was that when he was asked about it, he didn't react ever so well. Admins have to be accountable which means they should be able to account for their actions politely. Widsith 09:01, 5 September 2007 (UTC)[reply]

Preparation of Fundraiser 2007

Hi, this is just a first introduction message to tell you: there is more to come. I am dealing with the Project Management of the Fundraiser 2007 and therefore will search for contacts of wikimedians who can help us to do our tasks on all projects. I am actually also building the structure for the fundraiser on Meta. We will need people who help to design buttons, translate texts of buttons, documents, sitenotices etc. Should you feel you want to co-operate please let me know. You can reach me on my meta user page or by e-mail at scretella (at) wikimedia (dot) org. If you wish to notify us that you would like to co-operate on translations, it would be nice if you used e-mail and copied the e-mail to me and Aphaia (aphaia (at) gmail (dot) com). Thank you for your attention and I hope to meet you soon! Cheers :-) -- 4 September 2007 Sabine

Thank you Sabine! Is there a timeline for it? I don't think meta: will need much assistance for English translations.  :-)  --Connel MacKenzie 23:34, 4 September 2007 (UTC)[reply]
OT note: writing your email as user (at) something (dot) com doesn't help: you think the spammers haven't figured that one out? Send people to your user page where they can use the "email this user" link ;-) Robert Ullmann 14:45, 21 September 2007 (UTC)[reply]

West Frisian

The ISO 639-1 code "fy" is the collective code for the Frisian languages, so {{fy}} appropriately renders as "Frisian". Within that collection, there are some more specific ISO 639-3 codes, e.g. "frr" (North Frisian), "frs" (Saterland Frisian), and "fry" (West Frisian). Unfortunately, the ISO 639-2 code for the Frisian languages is also "fry", so {{fry}} also renders as "Frisian", leaving no English Wiktionary code for "West Frisian". Fortunately, no entries seem to use {{fry}} yet, so I'd like to to update it to say "West Frisian" and require editors to use the ISO 639-1 code "fy" for the collective Frisian languages. Any objections? Rod (A. Smith) 17:40, 6 September 2007 (UTC)[reply]

Actually, fy is defined specifically as West Frisian, not Frisian in general.[2] --Ptcamn 22:38, 6 September 2007 (UTC)[reply]
I see. “Previous usage of code has been for Western Frisian, although language name was "Frisian"”. That's where the confusion arose. In that case, I'll just update {{fy}} and {{fry}} to reflect the more accurate name. Thanks, Ptcamn. Rod (A. Smith) 22:50, 6 September 2007 (UTC)[reply]
It seems we should also deprecate Category:Frisian language and its subcategories with "Frisian" in their titles in favor of Category:West Frisian language and the like, and change all "==Frisian==" headers to "==West Frisian==". Rod (A. Smith) 22:56, 6 September 2007 (UTC)[reply]
Er, correction. According to w:Frisian language, “ISO 639-1 code fy and ISO 639-2 code fry were assigned to the collective Frisian languages, but are as of 2006 used only for West Frisian.” Assuming we don't care for categories or 2nd level headings for the collective Frisian languages as a unit, I'll convert all plain occurrences of “Frisian” to “West Frisian”. Rod (A. Smith) 23:13, 6 September 2007 (UTC)[reply]

Choosing the “primary entry” for idiomatic phrases

Connel and I are disagreeing over which of these two forms: son of the manse or child of the manse should house the “primary entry” for this idiom. Connel favours son of the manse because it is the most common form, and I favour child of the manse because it is epicene, is the most general, and because its plural form, children of the manse receives more Google Book Search hits than sons of the manse does (260:240). More detail can be found here. Which is it to be? † Raifʻhār Doremítzwr 21:07, 6 September 2007 (UTC)[reply]

Note 1,334 b.g.c. for son of the manse. --Connel MacKenzie 21:34, 6 September 2007 (UTC)[reply]
(after edit conflict that nearly crashed my PC) (for the purposes of this comment: "A" is the most common singular form, "As" is its plural; "Bs" is the most common plural form, "B" is its singular; "C" is a less common singular form, "Cs" is its plural)
I'd say the primary entry should be at the singular form with the most widespread recent (i.e. last 5-10 years) usage ("A"). If the most common plural form ("Bs") is not the most common singular form's plural ("As"), then the singular of the most common plural form ("B") should be a soft redirect to the most common singular form ("A"). There should be usage notes at "A", "As" and "Bs". - i.e:
A: main entry with usage note
As: standard "plural of" entry with usage note
B: soft redirect to "A"
Bs: standard "plural of" entry with usage note
C: hard redirect to "A"
Cs: hard redirect to "Bs"
If any of this is hard to understand (likely) then let me know and I'll try again! Thryduulf 21:44, 6 September 2007 (UTC)[reply]
I like this, simple and straightforward. But heretofore we have commonly been using "one's" and similar locutions in primary entries, e.g. feel one's oats, which saves a lot of grief. That precedent would also support the use of child of the manse as the primary. -- Visviva 23:24, 6 September 2007 (UTC)[reply]
We have been using "one's" as a reflexive pronoun placeholder in entry titles, although "one's own" may be better. The non-reflexive version has been "someone's" and "somebody's". Rod (A. Smith) 00:18, 7 September 2007 (UTC)[reply]
Quite right. I guess my point was that we have been using a gender- and person-neutral form ("one's X" or "someone's X") rather than whichever form happens to be most common ("his X", "my X"). This seems obviously to be the right choice where pronouns are concerned; it is less clear whether the same logic applies in cases like the current one (leading to a preference for "child" over "son", perhaps "sibling" over "brother/sister", etc.) -- Visviva 01:18, 7 September 2007 (UTC)[reply]
Yes, Visviva, that logic, as well as the principle that words should be defined from the general to the specific, are the two cruces of my reasoning.
Also, a minor correction if I may: Connel hereinbefore provided a link to the hits page yielded by searching for son of the manse on the Google Book Search engine. He forgot to enclose the phrase in quotation marks, which meant that the engine searched for every instance of son + manse (as of and the are excluded from such searches), rather than the set phrase son of the manse. The correct statistics are:
  1. son of the manse” = 623;
  2. child of the manse” = 129;
  3. sons of the manse” = 240; and,
  4. children of the manse” = 260.
† Raifʻhār Doremítzwr 16:24, 7 September 2007 (UTC)[reply]
Well it's easy enough to explain son of the manse in terms of child, but if the main definition were at son, how would you explain child in terms of it? It seems the only other option would be to have two main entries, and the question becomes whether son should be a complete entry or rely on child. In these cases I prefer a little bit of explanation to avoid link-chasing, while keeping the linked term dominant with a fuller explanation. DAVilla 03:52, 8 September 2007 (UTC)[reply]
Yes, that is the “general → specific” rationale. Could this issue be solved by expanding the definitions for son of the manse and daughter of the manse to “A specifically male child of the manse; that is, a diligent and industrious man or boy” and “A specifically female child of the manse; that is, a diligent and industrious woman or girl”, respectively? –Is that acceptable to you, Connel? † Raifʻhār Doremítzwr 12:05, 8 September 2007 (UTC)[reply]
I think that would be an excellent solution. Cheers! bd2412 T 01:29, 10 September 2007 (UTC)[reply]
Not quite. Rather, at first blush, for this individual example, that almost seems to fit. But, it is still a pretty wild diversion from common practice - the practice of putting the main idiom entry ate the primary entry form. The term son of the manse doesn't quite have more available citations that all the other variants combined but it clearly (very clearly) is the most common form. Wikilawyering reasons to slightly expand some of the soft-links doesn't address the underlying problem that brought this conversation here, to wit: the creation of specifically "incorrect" idiom forms while redirecting the "correct" form to that incorrect entry. The form "children of the manse" seems to be a cute reworking of the existing idiom - almost jocular. It is very misleading to our readers, to suggest that "children of the manse" should be preferred in their writings, over the understood idiom "son of the manse." The entire topic is difficult, as "son of the manse" is itself, such a rare idiom. But that fact, makes it more important that we identify the idiom that is likely to be understood, while giving little or no emphasis on the incorrect form. Simply following the existing practice of using the most common idiom form as the primary idiom entry, satisfies all these concerns.
Looking more closely into this curious UK idiom, specifically at w:Manse and its references, it seems to have a very strong monastery intended meaning. This reinforces the notion that "daughter of the manse" and "child of the manse" can be construed as intentionally humorous, jocular, sarcastic or satirical only. You, hailing from that side of the pond, should be proposing this; not me, an American. (Oh wait, our entry for "monastery" seems to include its antonym "convent" in its definition - as a subtype! - now? WTF?) --Connel MacKenzie 16:03, 16 September 2007 (UTC)[reply]
Furthermore, at WT:CFI#Idiomatic phrases it says "Many phrases take several forms. It is not necessary to include every conceivable variant. When present, minor variants should simply redirect to the main entry. For the main entry, prefer the most generic form, based on the following principles:" followed by WT:CFI#Pronouns, where it clarifies: "Prefer the generic personal pronoun, one or one’s. Thus, feel one’s oats is preferable to feel his oats. Use of other personal pronouns, especially in the singular, should be avoided except where they are essential to the meaning."
To me, it seems pretty clear that "sons", "child", "children", "daughter" and "daughters" are all unquestionably "minor variants." Furthermore, it also seems clear (now) that "son" is in fact, essential to the meaning. --Connel MacKenzie 07:09, 20 September 2007 (UTC)[reply]

Any active bureaucrat?

Are there any active bureaucrat to have a look at Wiktionary:Votes/bt-2007-08/User:VolkovBot for bot status please? --Volkov 19:14, 9 September 2007 (UTC)[reply]

Russian slang

I just tripped over a bit of silliness. Stephen doesn't like category "ru:Slang", thinks it should be "Russian slang", and I agree; It is Russian word that are slang, not Russian words about slang. Just like English nouns or French verb forms or Min Nan idioms. It is a grammatical attribute, not a topic.

The silliness was this edit in which he changed the middle "a" in "slang" to a non-latin letter to keep it from being categorized. There are other entries where he moved (slang) to the end of the definition line to keep AF from converting to the context tag. They all need to be fixed (the present contents of Category:Russian slang).

But then I think we should change the category class in {{slang}} so it will generate the language name forms. There are some languages that already have categories.

Similar issue with "Vugarities", and there are several more that ought to be looked at. Robert Ullmann 14:02, 11 September 2007 (UTC)[reply]

The same argument would probably apply to everything under Category:Lexicons. None of them are topics. I'm not sure I'd call them grammatical attributes either; from what I can see they mostly pertain to issues of speech register or pragmatics. Mike Dillon 15:11, 11 September 2007 (UTC)[reply]
Seems reasonable. The naming convention should maybe be that categories concerned strictly with the non-contextual definition of a word (i.e. semantic attributes) use the language code, while all other attributes (syntactic, pragmatic, discursive) use the language name. Except for etymology. And except for all those mysterious categories that conflate discursive and semantic properties (is Category:Anatomy for the professional terminology of anatomists, or anything related to a body part?) Should Wiktionary:Categorization perhaps document these conventions? -- Visviva 04:42, 12 September 2007 (UTC)[reply]
That makes sense. What do you suggest for Category:Archaic, Category:Nonstandard, and so on (where the words are themselves archaic or nonstandard, not about archaic or nonstandard)? Should they be renamed to something like Category:English archaisms, Category:English nonstandard usages, and so on? (This has the awkwardness of making us seek a good noun equivalent for each adjectival descriptor, but we can always fall back on adding "usages".) —RuakhTALK 16:04, 12 September 2007 (UTC)[reply]
I think that Category:Archaic should be for archaic words. Archaic russian words would be placed under Category:ru:Archaic. Check out Category:zh-tw:Archaic (the words in this category are archaic Chinese words written in Traditional Chinese script). Words about archaic would not get a special category since I doubt we could come up with more than a half a dozen words about the topic archaic. If I'm wrong about that, we could always start a new category called Category:words related to archaic. -- A-cai 17:07, 12 September 2007 (UTC)[reply]
Re: "Words about archaic would not get a special category since I doubt we could come up with more than a half a dozen words about the topic archaic.": Oh, certainly. This isn't about making room for other categories that better deserve these names, but rather about using more consistent names as it is. To me it seems like "this word is archaic" has more in common, in terms of the meaning of the category, with "this word is plural" than with "this word pertains to horses". —RuakhTALK 17:11, 12 September 2007 (UTC)[reply]

Translation into lemma only

Conversation moved to Wiktionary talk:Translations/Translation into lemma only to consolidate recent translation-related discussion. See Wiktionary talk:Translations. Rod (A. Smith) 22:18, 3 November 2007 (UTC)[reply]

Multiple context tags

What do we do in the case that a word exists in all regions with the same meaning, but needs a context tag specific to one region's peculiarity? I'm wondering if there is a general standard for this, but, in particular, if a word is formal only in one region, {{UK|formal}} produces "(UK, formal)" which to me implies that the word is British-only, and formal as well. I thought of the form "(UK: formal)," but I'm not sure if that's much clearer in saying that it's used elsewhere, but the context tag applies only to the one region; it's still not obvious that's not UK-only. "(Formal in the UK)," or is there a better way of doing it? Dmcdevit·t 02:46, 13 September 2007 (UTC)[reply]

I'd say the term has two distinct senses. A formal sense in the UK and a standard sense elsewhere. So, create two definitions. Rod (A. Smith) 03:34, 13 September 2007 (UTC)[reply]
What about when it is formal in Spain, and standard elsewhere? Spanish speakers can probably see where I am going with this. Do we want to break all second-person plural verb forms in all moods and tenses into two senses, one for spain where it is formal-only, and one for the rest, where it is the standard form? Dmcdevit·t 03:59, 13 September 2007 (UTC)[reply]
That's probably not necessary since those are not lemma entries. Non-lemma entries probably just need definitions in terms of the grammar relationship to lemma entry (“(grammatical) third person plural form of...”), not in terms of any semantics. Rod (A. Smith) 04:09, 13 September 2007 (UTC)[reply]
Er, the lemma form for verb forms is an infinitive, which is most certainly standard. Since it is only this inflected form that has the variable meaning, we would be omitting necessary information by excluding the context tag on these. Dmcdevit·t 04:28, 13 September 2007 (UTC)[reply]
Of course the lemma form for Spanish verbs is the infinitive. I was discussing whether the definitions for the non-lemma third-person plural forms of every Spanish verb should be split according to how different coutries treat the pronoun ustedes. The formality is a property of the pronoun, not a property of the verb inflection, so there is no need to indicate the regional formality differences anywhere but in ustedes and in an appendix. Rod (A. Smith) 06:40, 13 September 2007 (UTC)[reply]
Since the pronoun isn't the lemma, the only alternative is to inexplicably banish the formality and region of the word to an appendix, and not any of the other grammatical information. In any case, this is getting a bit off-topic, but marking these is already the current convention. I'd just like to come up with a better convention for these. Dmcdevit·t 07:35, 13 September 2007 (UTC)[reply]
Since this will apply to every such second-person form, my solution is to have Usage notes section containing a template with a short message stating the siuation and linking to an Appendix. --EncycloPetey 15:52, 14 September 2007 (UTC)[reply]
I've been thinking about the same thing; perhaps the best solution would be to create {{formal in UK}} and a matching Category:UK formalisms (or whatever); likewise {{UK slang}} and Category:UK slang (aha! I see that exists already)... we certainly have enough material to justify many of these intersections already, and this project is really just getting started. -- Visviva 03:39, 13 September 2007 (UTC)[reply]
I guess I misunderstood the nature of the concern... I don't have any opinion on cases like that discussed above (where regional formality is a global property of certain forms), but in other cases I don't see why we shouldn't put that information at the end of a definition rather than the beginning. That is, if a particular sense is widespread but is considered formal/informal only in one region, it should not be presented as modifying the entire definition. -- Visviva 06:31, 14 September 2007 (UTC)[reply]
The way to do it is to use {{context}}, so you can have {{context|formal in the|UK}}, which gives: (formal in the, UK). We just need DAVilla or someone to tweak the template so that the editor can tell the template not to insert the default comma. --EncycloPetey 15:52, 14 September 2007 (UTC)[reply]
The “_” argument does the trick. E.g.: {{context|formal in the|_|UK}}, which produces, “(formal in the UK)”. Rod (A. Smith) 17:26, 14 September 2007 (UTC)[reply]

Wiktionary:About languages

Wiktionary talk:About Greek attention was drawn to the fact that this type of page would more suitably reside in "Help:". I tend to agree, although many references agree with the current locations: Wiktionary:Language considerations, Wiktionary:About Persian, Wiktionary:About Latin etc. Any move would need to be coordinated with others.

Question
should "About Greek" be moved, and where? —Saltmarsh 14:47, 13 September 2007 (UTC)[reply]
The way I understand it, some of the content (i.e. "typing in greek") can and probably should go in "help", but everything that has to do with how Wiktionary deals in wording or formatting the specificities of a language should stay at "about X". In the case of Persian, "transliteration" would stay where it is (if it is ever adopted). Circeus 15:16, 13 September 2007 (UTC)[reply]
My understanding of these pages, was that they were supposed to provide policy clarification. That is to say, only listing special concerns about a language that supersede the regular rules in WT:ELE. This provides a quick summary of what is acceptable, both for newcomers interesting in entering terms, and sysops (like me) who may not be familiar with a language, but see something that looks wrong with a particular entry.
I admit that some of the other "About" pages have also diverged from that purpose, quite a bit. I guess my question is twofold: #1) do we want these "help-ish" guides in the about pages? #2) Where do we want to keep the brief summary of "the language concerns that override regular WT:ELE rules"?
TIA. --Connel MacKenzie 19:00, 13 September 2007 (UTC)[reply]
Good points. Another factor to keep in mind is that we have two audiences to consider: editors and readers. To me, the Wiktionary namespace seems intended for editors, while the Help namespace seems intended for readers. Rod (A. Smith) 19:08, 13 September 2007 (UTC)[reply]
Hmm. A quick glance at the Help namespace shows I'm wrong about its purpose. Rod (A. Smith) 19:09, 13 September 2007 (UTC)[reply]
The namespace you're looking for is Appendix:, and that's where information about the language should be placed if it is directed at readers. --EncycloPetey 15:44, 14 September 2007 (UTC)[reply]
Most of the information in the "About" pages is directed towards editors rather than readers. So what does go in the Help namespace, anyways? ArielGlenn 19:23, 15 September 2007 (UTC)[reply]
Not sure. I seldom see it used, so there's no pattern to see what it's supposed to be used for. I was editing here for almost a year before I even knew it existed. I assume that it's for technical information. --EncycloPetey 18:23, 16 September 2007 (UTC)[reply]

References

I have used the <ref> html tag (if it is html - it isn't in my 2002 manual) in the article χούφτα - it creates a footnote type reference which gives detail to the reader. It is commonly used in Wikipedia. Whereas the references in gobbet make it difficult to see what was sourced where. Is my departure deprecated? (For technical reasons, but not because it creates more work :).) Should I stop doing it? —Saltmarsh 14:44, 14 September 2007 (UTC)[reply]

I've wondered about this too, Saltmarsh. As Wikitionary entries grow more complete and complex, it seems to me to be increasingly desirable to be able to tie specific parts of an entry to their specific sources (as, for example, in ambulance chaser). However, it strikes me as awkward and unsightly for both footnotes and the more common sorts of references to other dictionaries to be combined under a single header (as in however). -- WikiPedant 14:57, 14 September 2007 (UTC)[reply]
I’ve been using in-line references too. I think they’re vital for showing exactly which information is sourced and whence — be it an etymology, a pronunciation, an irregular inflexion or conjugation, a context tag, or whatever. In the case of however, I’d move the nine bulleted references to a “Dictionary notes” section whereunder they’d be more suitable. † Raifʻhār Doremítzwr 15:05, 14 September 2007 (UTC)[reply]
Take a look at how I just subdivided the references in however and see what you think. I think subheaders under "References" is the best way to go, since moving the bulleted references that cite other dictionaries to a whole new section would be inconsistent with the style of a gazillion other WT entries. -- WikiPedant 15:26, 14 September 2007 (UTC)[reply]
Please, forgive my bluntness, but the list of references for however is utterly useless. There is no indication at all as to what information came from any of the listed sources or how the sources were used as a reference. The point of a reference is to (1) give credit to the source of information used, and (2) bolster statements of fact by making it possible for someone to verify te research. A bulleted list of web sites at the end of an entry fulfills neither of these purposes. Subdividing it does nothing to help. --EncycloPetey 15:42, 14 September 2007 (UTC)[reply]
Agreed. However, rather than using actual subheaders, use code like ;Dictionaries and ;Notes instead — the result is vitually identical, but has the added benefit of getting rid of the “edit the section” buttons on the right (which are for some reasons bigger for the sub-sub-sub-…-headers) and will probably stop Autoformat going nuts with its rfc-invalid header templates. † Raifʻhār Doremítzwr 15:38, 14 September 2007 (UTC)[reply]
Yes, Doremítzwr, the semicolon command clearly works much better. I disagree with EncycloPetey that the bulleted links to other dictionaries are useless, though. They are just what the doctor ordered for users like me who frequently want to get a quick take on other defns to compare/contrast with what they read on the WT page (which strikes me as consistent with Petey's point (2) above). One other technical matter -- I notice, Doremítzwr, that when you edited however you changed "References" to L4. This conforms to WT:format, but those examples show separate "References" headers under each meaning (which I'm not so sure I've ever actually seen in Wiktionary). On the other hand, the template provided when the user hits, say, the "Noun" button on the create entry screen shows an L3 "References" header (which makes more sense to me, if the "References" section comes at the end of the entire entry). Are both header levels acceptable, depending on the placement of the section? -- WikiPedant 16:13, 14 September 2007 (UTC)[reply]
I think I should clarify that the bulleted list is useless as references. They are fine for use as External Links, but they are not useful as References, --EncycloPetey 00:57, 15 September 2007 (UTC)[reply]
I think Cite.php (the ref-references system) is very useful in certain sections: etymology (as in your example), usage notes, and perhaps pronunciation. These are areas where our usual principle of verification from use is difficult or impossible to apply. However, in re the discussion above, I think we need to continue aggressively questioning the practice of "referencing" whole entries to third-party dictionaries. Copyright concerns aside, it's just sloppy practice; our entries should stand or fall on their own merits. -- Visviva 15:58, 14 September 2007 (UTC)[reply]

I'm very much in theoretical support of <ref>, but Cite.php has the serious bug that multiple uses of <references>-s don't work properly — see User:Ruakh/Cite for a demonstration — so we can't have a separate "References" section for each language. If we can get that bug fixed, then <ref> is the way to go. —RuakhTALK 22:53, 14 September 2007 (UTC)[reply]

In the meantime, with only one (references /) tag available the refs would have to be at the end of the page - with multilanguage pages this would necessitate a L2 Reference header - is this possible/advisable. Since the superscript number and related up-arrow allow easy movement between text and reference perhaps the normal reader would not be bothered where the references were? —Saltmarsh 05:42, 15 September 2007 (UTC)[reply]
Has the bug been reported to the technical people? If so, do they have any idea when it’ll be sorted? If sourcing information for multiple languages under one references section, then yes, sticking them at the end under an L2 header seems like the logically best option — will Autoformat et alia be OK with that though? In the meantime, is developing <ref1 name="">, <ref1>, <references1/>, </ref1>… commands possible / workable / useful? † Raifʻhār Doremítzwr 15:28, 15 September 2007 (UTC)[reply]
Yes. It's bugzilla:6271. I'll add a comment about how this is useful for Wiktionary because of the multi-language issue. Feel free to vote for it and it might get some love. Mike Dillon 20:51, 15 September 2007 (UTC)[reply]
P.S. The proposed functionality would look something like this in the wikitext:
== English ==
...
...<ref name="XXX" group="en">....</ref>
...
<references group="en"/>
...
----
== German ==
...
...<ref name="XXX" group="de">....</ref>
...
<references group="de"/>
The only part that would be slightly onerous would be that the editor would have to manually put group="XX" onto every <ref> and <references> tag; there is no way to get the software to do it automatically based on our section conventions. Mike Dillon 21:03, 15 September 2007 (UTC)[reply]
Sounds good! —SaltmarshTalk 05:59, 16 September 2007 (UTC)[reply]
They don't need to be grouped; it would suffice if the references/ tag would clear the list when it generates them, so the next section starts anew. It doesn't need to know anything about our section conventions. Reading the bugzilla shows other intended uses though, and we could certainly use the groups if that was provided. As Mike points out though, this is more work that we don't really need. Robert Ullmann 14:38, 21 September 2007 (UTC)[reply]
That's a good point and it would probably need a separate bugzilla. I'd expect that having the tag clear by default isn't a slam dunk with the MediaWiki developers, but I could see a clear="true" flag being accepted, or even a setting in LocalSettings.php. I can't think of any reasons to use multiple <references> tags without having it clear out, but the possibility of unintended regressions probably means that a separate attribute or a flag in the settings would be easier to get into the code. Mike Dillon 00:43, 22 September 2007 (UTC)[reply]
Wait, I just thought of a complication. Some state would need to be maintained to avoid generating invalid duplicate ids for the forward reference links and backlinks. Mike Dillon 00:44, 22 September 2007 (UTC)[reply]

Hanja entries

For anyone interested, a conversation about hanja entries has been moved from User talk:Connel MacKenzie#about Hanja to Wiktionary talk:About Korean#Hanja entries. Rod (A. Smith) 19:09, 14 September 2007 (UTC)[reply]

Entries for letters of the Latin (Roman) alphabet

Letters of non-Latin (non-Roman) alphabets have their own entries, e.g. the Cyrillic letters а and б, but incredibly, letters of the Latin (Roman) alphabet do not. The entries at a and b, for example, have definitions for various words and symbols with the spellings a and b, but nothing for the letters themselves. Is that omission by choice or oversight? Rod (A. Smith) 23:15, 14 September 2007 (UTC)[reply]

I think all our alphabet listings have suffered from the general lack of consistency. Once a year or so, someone volunteers to plow through them all, giving up when the problems become intractable, or too many people complain. (E.g. Arabic Alphabet.) I would like to see some consistency in how we handle these. Listing them all as symbols (===Symbol===) in a "Translingual" section seems like the most comprehensive, universal approach. The individual definition lines there can describe what language (or language families) use those characters. (Right?) --Connel MacKenzie 01:02, 15 September 2007 (UTC)[reply]
Agreed. Though, can we have a ===Character=== header instead? "Symbol" is the Unicode term for a certain subset of the characters (distinguished from letters, marks, numbers, punctuation, separators, and control characters), so it seems a bit awkward to use it more broadly. —RuakhTALK 03:19, 15 September 2007 (UTC)[reply]
To me, a character is simply a minimal written unit of text (so it describes digits, letters, whitespace, symbols, ideograms, etc.) but a symbol must actually symbolize something specific, so is technically not accurate for letters themselves. From a brief conversation I had in IRC, though, I get the impression that others' definitions don't necessarily distinguish between the terms. Whatever we use, it should be understandable to readers and should be described in Appendix:Glossary. Rod (A. Smith) 07:44, 15 September 2007 (UTC)[reply]
Please see what I've done with the entry for the letter a. I used the heading "===Character===" as an example. The "etymology" describes briefly the roots of the character's shape. The definition line explains that this particular character is a lower-case letter, as opposed to other characters that may be digits, symbols, ideograms, etc. I also show some of the more common or well known derivations, using a link to the appendix for further exploration. Comments? Rod (A. Smith) 21:56, 15 September 2007 (UTC)[reply]
I think that this looks good - I raise a couple of points: (1) Do the "===Character===" and the "===Abbreviation===" have different Entymologies? (2) With lists of chars eg (à, á, â, ā, ä, å) "commas as separators" may easily become confused with "commas as modifiers/diacritics" as in the list a, a, α', , ά - where is confusing - should the separators be omitted? —SaltmarshTalk 06:37, 16 September 2007 (UTC)[reply]
Good suggestion, Saltmarsh. It looks much cleaner without the commas. Since the entry and the list items are single characters, nobody will be confused.
As for etymologies, though, there really is more than one etymology. Well, "etymology" in the Wiktionary sense of the word (just as "part of speech" includes "proverb" here). The origin of the letter is a different letter, but the origin of the abbreviations is the words they abbreviate. Is there a better way to present the origin of the letter? Rod (A. Smith) 09:58, 16 September 2007 (UTC)[reply]
I like that "===Character===" accurately describes the smallest units of language that we describe here.
  • ===Symbol=== is wrong because strict definitions of symbol, e.g. that used by Unicode, exclude meaningless letters, while looser definitions include entire words.
  • ===Letter=== is wrong because letter excludes punctuation and non-alphabetic scripts.
  • ===Grapheme=== is wrong because we don't want to describe any particular font, but the abstract, underlying concept.
Anyway, the possibility of adopting the header "===Character===" brings to light some other potential types of entries that are smaller than a word:
  • Digraphs, e.g. Spanish ch and ll, and trigraphs for that matter. Note: these are different from ligatures, which are technically characters.
  • Morse code sequences, e.g. •- (dot-dash, a.k.a. di-dah, “A”).
Should we have a unique POS header for these entries that are smaller than words/morphemes but larger than characters? Rod (A. Smith) 22:53, 16 September 2007 (UTC)[reply]
As I said before, there is significant inconsistency in the existing entries. BUT, that doesn't mean anything new is needed. Currently, ===Symbol=== and ===Letter=== are used fairly extensively. Since letters are symbols themselves, it makes sense to me, to use the general purpose heading "Symbol" for this class of entry. But, if you were to disregard that concern, the sensible solution would be to use "Symbol" and/or "Letter" as appropriate...just as WT:POS suggests. But that, I think, would be a mistake. Describing items in more detail on definition lines is better than randomly adding (rarely used) headings.
To address the notion of Unicode: how they define "Symbol" is fine for them, but not adequate for our purposes. For comparison, just as CGEL calls all nouns "noun phrases", we don't use other's specialized terminology. Instead, as needed, we have our own specialized terminology (e.g. all "noun phrases" here are called "===Noun==="s.) So, anyone quoting the CGEL, saying that nouns don't exist (or some other thing that might be appropriate only in a CGEL context,) is bound to have trouble here, where our concerns dictate the opposite nomenclature. Likewise, even though the kind people of the Unicode consortium are very intelligent, their goal is not to define "all words in all languages." So adopting their terminology, suitable to their goal is not helpful.
For the ~20 "good" headings, the ~25 "acceptable" headings, ~20 explicitly "deprecated" headings, the ~100 automatically "corrected" heading errors, we still have some 1,174 third-level headings in use, total. (In the main namespace only, not counting 4th level POS headings, etc.) I don't think people realize how useless it is to have unusable data like that laying around. Being so wild, causes all such entries to simply be excluded from all derivative works (e.g. http://www.panimages.org/, http://ninjawords.com/, yawiktionary, etc.) It also causes those entries here to be mis-categorized, mis-corrected, misplaced, miscounted and discounted. That is to say, there is a very desperate need to consolidate the existing headings, not ever to encourage more. From where I sit, anyone proposing new headings is subtly seeking to destroy the usefulness of en.wiktionary.org, or en.wiktionary.org itself. But, perhaps I've been staring at code too long. (Certainly Stephen thinks I have.) But I'm not even talking about parsing Wiktionary entries into a fine-detail level; I'm only talking about the highest level structure...even that isn't contained!
To say the situation is frustrating, doesn't even begin to describe it. Anyone desiring additional headings should first build a time machine and go back a few years, to propose the heading in 2003/2004. For now, our main concern should be at eliminating the 1,000+ invalid headings, and setting policy to ensure they don't resurface.
--Connel MacKenzie 02:43, 17 September 2007 (UTC)[reply]
Thanks for that insight into where you're coming from when you get angry about proposals for new standard headers. I think your position is misguided, though: each additional standard header probably means the use of a dozen fewer headers, because suddenly there's a usable standard header to replace a bunch of nonstandard headers with. (If a word doesn't even remotely fall under any of our existing headers, then editors will use a non-standard header for it, and resist any attempt to replace it with a useless-but-standard header; and worse, different editors will use different non-standard headers for the same kind of word. If we cover such classes of word with a few additional headers, suddenly a lot of that variety gets handled quite easily.) —RuakhTALK 04:56, 17 September 2007 (UTC)[reply]
I'm not speaking theoretically (as your postulation surely is.) My experience here on en.wiktionary shows the exact opposite to be true; for each "standard" heading added, at least a dozen variants (even more tangential) and many dozens more typos of those, start being used. The majority of those, standard or tangential, are soon abandoned. --Connel MacKenzie 05:07, 17 September 2007 (UTC)[reply]
Hmm, I somehow missed the fact that WT:POS includes "===Letter===". I also didn't realize that adding a new standard header was so complicated (and I still don't fully grasp the extent of the work that must be involved), but I'm relieved to learn that "===Letter===" is already valid. Fortunately, that solves my Spanish digraph problem, since ch and ll are letters but are not characters. So, I will use "===Letter===" for Latin (Roman) alphabet entries. Since "===Symbol===" is the heading we need to use for entries that are not letters but are smaller than morphemes, please define our unique sense of symbol in Appendix:Glossary and update WT:POS to link there. (It currently links to our dictionary entry for symbol.) Rod (A. Smith) 05:52, 17 September 2007 (UTC)[reply]
I only get 520 distinct headers in error, with 16142 instances, including ~7000 "X form" headers. There are 834031 L3 headers in the wikt NS:0, this is not a big error rate (1.1%), although we should do better. And a few of those are needed POS headers. Robert Ullmann 14:26, 21 September 2007 (UTC)[reply]

Script templates.

I'd like to write a bot that creates {{Deva}} (for Devanagari), {{Ethi}} (for Ethiopic, i.e. Ge'ez), and so on for all the ISO 15924 four-letter script names (listed here), except for Zxxx, Zyyy, Zzzz, and (obviously) any that already have templates. This bot wouldn't be particularly intelligent; it would just go through the list of four-letter codes, check if the template exists already, and if not, set it to (for example) <includeonly><span class="script-Deva" >{{{1}}}</span></includeonly><noinclude>This template may be used to enclose text in the Devanagari (Nagari) script; it may be called directly, as <tt>{&#x7B;Deva|<var>text</var>}}</tt>, or may be passed via the <tt>sc</tt> parameter to templates that support that parameter, such as {{temp|t}} or {{temp|term}}. Note that text in a non-Latin script should ordinarily be accompanied by a romanization. [[Category:Script templates|Deva]]</noinclude> (Later, these templates might be modified by people familiar with each script, setting good default fonts, linking to relevant language-considerations pages, mentioning similar-but-distinct templates — like how we have a {{KUchar}} separate from {{Arab}} and a {{polytonic}} separate from {{Grek}} — redirecting to better templates, etc.) I will check and patrol every single one of the bot's contributions during this process, and will fix any major mistakes manually, so that's not a concern; but before doing it, I'd like to make sure that other editors agree this is something that should be done. (By the way, if I do this, it will be as Rukhabot.) —RuakhTALK 00:47, 15 September 2007 (UTC)[reply]

The recent discussions on WT:GP about existing language templates shows a strong desire to have these prefixed in a meaningful way. for example, I think {{l-Deva}} would be better. --Connel MacKenzie 00:52, 15 September 2007 (UTC)[reply]
I don't have a strong opinion either way about whether to prefix the script templates, but it would be nice to have them all available. I would suggest, though to have the bot write that brief documentation to the talk page instead, surrounded by a "=Documentation=" line and a "=Discussion=" line. Rod (A. Smith) 00:56, 15 September 2007 (UTC)[reply]
Good call, will do. —RuakhTALK 19:53, 15 September 2007 (UTC)[reply]
Hmm. The ones we currently have don't have any sort of prefix; seeing as they'd all be in one category anyway (Category:Script templates), do you object to my creating these now for consistency, and moving them later if we decide on a different naming scheme? (If you do object, I'll hold off until there's a clear consensus on how exactly they should be named.) —RuakhTALK 19:53, 15 September 2007 (UTC)[reply]
Please use a coherent prefix (AKA a pseudo-namespace within the template namespace) for these. These collectively represent a fairly specialized use; having them together (not just categorized) means that erroneous additions can be caught and corrected. Having them spread across the (crowded) template namespace, means nonstandard additions will likely go unnoticed indefinitely. --Connel MacKenzie 05:14, 17 September 2007 (UTC)[reply]
Thanks for your reply. I'll just wait, then. I actually tend to agree with you that we should have a prefix for these, but since other people have expressed disagreement, and the existing ones don't have a prefix, and anyway I don't want to be in charging of deciding what prefix to use, I'd rather just not worry about it until we seem to have consensus on whether and what prefix to use. —RuakhTALK 05:37, 17 September 2007 (UTC)[reply]
On my screen modern Greek script shows up correctly without filtering through a template. (1) Why do we need to use it? (2) Should it be used for all occurences of mGreek? The output looks more attactive, but the labour entailed considerable. I feel that the answers need to be given for simple souls like myself AND it should be detailed under the Documentation for each template ,for those who follow. (with apologies if this has been done elsewhere) —SaltmarshTalk 12:44, 15 September 2007 (UTC)[reply]
I had a longish conversation with User:Rodasmith on IRC earlier about the usage of {{Grek}} (or rather, how I don't ever use it), before he added the documentation. I don't think there was any particular use in mind... (Rod please correct me if that's wrong), just making the template available in case someone *does* want it for something. ArielGlenn 19:29, 15 September 2007 (UTC)[reply]
I don't think anyone expects that all users will make use of script templates; but if they all exist, then bots can start adding them appropriately. (Some things might require more intelligence than a bot, but I think a fairly simple bot could handle the great majority of cases, while leaving alone cases that it can't.) At any rate, I'm not willing for my bot to add documentation of how the templates should be used, because I don't know that we have an answer for that yet, and indeed, I don't know that we'll have the same answer for all scripts; rather, at this point it will just add documentation on how they can be used. (After all, template talk-pages don't exactly constitute policy pages, anyway; it's up to real policy pages to tell people what templates to use.) —RuakhTALK 19:53, 15 September 2007 (UTC)[reply]
Correct. Most modern browsers display Greek fonts just fine without any particular script selection, so there is no requirement to use {{Grek}}. I have modified the documentation to clarify that. Rod (A. Smith) 21:35, 15 September 2007 (UTC)[reply]
I absolutely agree - there is no point in making life more complicated with unnecessary templates. Can {{Grek}} be phased out or at least marked as deprecated? —SaltmarshTalk 09:40, 16 September 2007 (UTC)[reply]
We should not make Greek the only script without a template. λόγος looks significantly different from (and more legible than) λόγος on my browser. Does it look better or worse on yours? In any event, the presense of {{Grek}} shouldn't make anyone's life more complicated. Each script defined by ISO 15924 really does need its own script template. Eventually, those templates will be applied consistently, allowing us to provide the best possible fonts for displaying each word. Readers will even be able to choose their favorite font for each script. {{Grek}} is marked as optional. Rod (A. Smith) 09:52, 16 September 2007 (UTC)[reply]
Sorry Rod, I ain't trying to be awkward :). I agree the Grek λόγος looks better than the bog standard one, but how can a template be applied automatically when λόγος is the same in Greeks Ancient and modern - [λόγος : λόγος] context will not be sufficient. Also, we shall need to think through where, when there is o/p through a template, the font will be applied. And, as Connel mentions above - we should rename the templates methodically: f-el, f-grc etc. You say that {{Grek}} is marked as optional, but surely if we use it, we should apply it to all new i/p. —SaltmarshTalk 14:38, 16 September 2007 (UTC)[reply]
The font specification is just "temporary" (with a timeline as long as browser versions and OS versions come and go, so the fonts will be here for quite a while); the end-point to get to is to apply the XHTML tag "Grek" and let the browser do what it will. The names should not be given a fixed prefix; they are designed to work with the language codes. (You think we are the only ones using this stuff??!!) Modern Greek should be {{Grek}}, Ancient Greek (now polytonic) should be {{grc-Grek}} which is the standard IETF/ISO/XHTML tag. This isn't complicated; and all the coding is done for us. Robert Ullmann 16:02, 16 September 2007 (UTC)[reply]
None of those standards organizations or standards specify "{{" nor "}}" for these. Since we are using their standard names for our purposes, it only makes sense to use an organizational prefix. I think "s-" would be fine, as would "p-". Neither {{s-}} nor {{p-}} are used now, nor are likely to be needed for anything else. So, {{s-Grek}} & {{s-grc-Grek}} would be a lot less ambiguous in this en.wiktionary.org context. --Connel MacKenzie 05:18, 17 September 2007 (UTC)[reply]
A few points/queries: (1) the italic form of the font from {{Grek}} is not so good (cf φίλιος with φίλιος), could it be enlarged (φίλιος). (2) I am not sure that I understand the optional nature of this, resulting in Greek words appearing in a mixture of fonts - most users will think that there is significance in this variation. (3) Are we stuck with the name "Grek" which departs from the 2-letter language codes used elsewhere, making it harder to recall those used infrequently. —SaltmarshTalk 10:14, 23 September 2007 (UTC)[reply]
I agree that we should identify the variations and seek to standardize them. So that I can better understand the problem, can you point out a circumstance where we put Greek terms in italics? Just one example of an entry with italic Greek text should suffice. (I ask because you there may be a technical problem, causing you to see italics where I see non-italics.) Regarding the template name, please notice that the two-letter codes are for languages, but this is a script template, so it should have the four-letter, initial upper case, ISO script code. Does that make sense? Rod (A. Smith) 20:25, 23 September 2007 (UTC)[reply]
gaol#Etymology was where I came face-to-face with it when I was completing a request for Greek script. —SaltmarshTalk 05:45, 24 September 2007 (UTC)[reply]
Very helpful. Thank you for pointing that example out. The easy answer is just to remove the italics. Coincidentally, though, I also am proposing {{term}} as a means of formatting such things. So, the standard form for your example and the form using the draft {{term}} follow:
German ''[[geil]]'', “wanton”; Greek {{Grek|[[φίλιος]]}}, “friendly”.
German geil, “wanton”; Greek φίλιος, “friendly”.
German {{term|geil||wanton|lang=de}}; Greek {{term|sc=Grek|φίλιος|tr=fílios||friendly}}.
German geil (wanton); Greek (deprecated template usage) φίλιος (fílios).

How do those examples appear for you (both the wikitext and the rendered format)? Rod (A. Smith) 06:06, 24 September 2007 (UTC)[reply]

Excellent - both the appearance and means of achieving it, thanks. —SaltmarshTalk 04:17, 25 September 2007 (UTC)[reply]
Am I right in assuming that the template is intended for use in all occurrences of Greek characters? —SaltmarshTalk 06:33, 25 September 2007 (UTC)[reply]

Braille letters, digits, and other symbols

I just noticed Braille letters A () and Z () had entries. They weren't in a standard style, so I cleaned them up a bit and created categories for Braille letters and digits. Any changes to make before continuing? Rod (A. Smith) 00:49, 15 September 2007 (UTC)[reply]

Those both look fine to me. Nicely done. --Connel MacKenzie 00:55, 15 September 2007 (UTC)[reply]

Further reading

I propose to add a rule about external links (surprisingly, I can't find this rule anywhere):

  • In the main namespace, external links provided only for encyclopedic (or commercial!) reasons, without any linguistic reason, are prohibited, except links to Wikipedia (links to Wikipedia provide a bridge between the linguistic and the encyclopedic worlds).

If a user wants to put a link to his personal website (or to his company site) in his user page, why not (but he must be aware that these links are not considered by Google algorithms). Lmaltier 06:53, 15 September 2007 (UTC)[reply]

Note that that isn't quite right. While WMF servers have "nofollow" settings, no mirrors ever do. (Why else would we be spammed so regularly?) --Connel MacKenzie 08:18, 16 September 2007 (UTC)[reply]
Wiktionary:What Wiktionary is not says something about the issue:
Wiktionary is neither a “mirror” nor a repository of links, images, or media files. All content added to Wiktionary may be edited mercilessly to be included on the site.
It isn't terribly clear, though, so I'd support ammending it with your wording. Rod (A. Smith) 07:04, 15 September 2007 (UTC)[reply]
This is a worthy proposal, but I'd like something even stricter, to rule out random links to third-party linguistic resources like Dictionary.com as well. IMO such references are only appropriate when used to back up specific assertions in the entry (and wherever possible, such references should link directly to scholarly writings or specialized references, rather than to other dictionaries). -- Visviva 13:16, 15 September 2007 (UTC)[reply]
Or when the reference is helpful in some other way, e.g. reference to a video recording explaining some linguistic issue in a lively way. You cannot imagine every possible case but, you are right, the link should be helpful in some way, and this should be mentioned. Lmaltier 13:59, 15 September 2007 (UTC)[reply]
The proposal looks good to me as well. --EncycloPetey 15:36, 15 September 2007 (UTC)[reply]

Categories, semantic and contextual

Hi all,

OK, this has been bugging me for a long time, so I'm just going to get it off my chest.

Much of our current topical structure is problematic. Consider Category:Anatomy, which contains both a large number of common terms for body parts like arm and head, as well as technical anatomy terms like caudal. This is an impressive jumble, but it is difficult to see its value, either to the end user or to us. I would like to propose the following general approach:

  1. That categories pertaining to the technical terminology of a specific field be at [[Category:<language> <field> terminology]], thus for example Category:English anatomy terminology (or Category:Anatomy terms, if preferred). Membership in a technical lexicon is a property of the word, not of the referent.
    1. That most categories generated by {{context}} and its ten thousand children should be handled in this way, since they deal with the contextual use of the word rather than with its semantic meaning (see recent discussion of Category:Vulgarities).
    2. That all contextual-use categories be grouped under Category:Lexicons, possibly with the addition of trunk categories like Category:Technical terminology.
  2. That categories pertaining to context-independent meaning (i.e, to the referent) be kept in a clear hierarchical relationship based on meronymy and hyponymy (bodies:body parts:arm).
    1. That we acknowledge the direct linkage of semantic and POS affiliation; for example, Category:Body parts is by nature a direct or indirect descendant of Category:Nouns.
    2. That we give serious consideration to the lessons of WordNet in defining fundamental categories for each POS. Miller & Fellbaum's work identified 26 fundamental noun groups and 14 fundamental verb groups; we needn't rely on this specifically, but it's an obvious point of reference (at least for nouns and verbs).
  3. That categories with overlapping contextual and semantic affiliations should be permitted only where there is a genuine intersection; thus Category:Proteins is properly contained in a biochemical terminology category, in addition to a semantic category such as Category:Substances.
  4. That the existing topical categories be maintained primarily as user-convenient access points to the relevant terminological and semantic categories and appendices.

The above may not be the best solution -- I'm just tossing it out there because it's been on my mind -- but I don't think that our current topical structure can ever yield satisfactory benefits. This basically stems from the fact that words don't have topics; in this respect they differ from websites, encyclopedia articles, books, and the like. Topical ontologies are marvelous on WP and DMOZ and the town library, but can't perform adequately in a lexicographic setting. -- Visviva 14:08, 15 September 2007 (UTC)[reply]

Personally, I think this sort of thing is better handled with an Index: or Appendix:. That does not mean that I am opposed to the idea of refining our categories; rather, I am going to advise care in whatever we choose to do. Consider that the current topical categories are set up (as often as possible) to match the context at the head of the definition line, and not for any other reason. Thus, a word used in discussing anatomy, or a sense used in an anatomical context, will have (anatomy) at the head of that definition line and the context template used to insert that text will simultaneously categorize the word so that all such words with that context will appear in a single category. That doesn't mean that we can't add additional categories to an entry, but we shouldn't try to completely divorce such entries from the primary catgeory either. As an example, Category:Astronomy has subcategories of Category:Constellations and Category:Stars. The words in these categories have (astronomy) as their context, but use a separate template that categorizes them in the appropriate subcategory of Astronomy. In short, Whatever topical separations we might make shouild be carefully structured to maintain ties to the category of the parent context. --EncycloPetey 15:45, 15 September 2007 (UTC)[reply]
Yes, and I definitely don't want to undo any of the good work that has been done so far. My preferred approach is to maintain the existing topical structure -- which has definite advantages in terms of user-friendliness and cross-project compatibility -- but split most of these into semantic and contextual categories with their own hierarchies; so we would have Category:English astronomy terminology alongside Category:Constellations et al. (most constellation names obviously not being technical astronomical terms, although I suppose many star names are). Ideally I'd say in any case where {{context}} is appropriate, the category should be lexiconic rather than topical -- that is, it is appropriate to label an entry (astronomy) only if it is actually a technical astronomical term.
I had been thinking about using Wikisaurus: or Appendix: space for this, but these are basically properties of individual word senses -- as our pervasive use of {{context}} shows -- so categorization seems like the ideal method. This doesn't need to be any sort of massive enterprise; my ideal approach would be, after some discussion here, to do a proof-of-concept on some small chunk of the topical tree ... then to return to the community for discussion and an update to Wiktionary:Categorization (which is overdue for an update anyway) ... and then to work on gradually applying these principles to the topical tree as a whole. Since this doesn't involve uprooting the existing tree -- more like planting some rigorous saplings alongside it -- it can be done gradually without undue disruption. -- Visviva 04:32, 16 September 2007 (UTC)[reply]
But we shouldn't have any;; topical categories that identify the words as specifically "English". Topical categories are English by default, and when they are not English they are prefixed with the appropriate ISO code. --EncycloPetey 18:21, 16 September 2007 (UTC)[reply]
That's my understanding as well... Any categories related to what words mean, such as Category:Stars and Category:Vehicles, viz. semantic categories, should be English by default (at least that's our practice, and I don't seek to change it). On the other hand, our practice for categories based on usage has been somewhat confused, but tending toward the same [language] [category description] convention found in POS categories. So I would have assumed that a lexicon category for technical terminology in field X should be at "[language] field X terminology." That was -- I thought -- the point of the recent flap over Category:Vulgarities and others. I have no real objection to "Category:Field X terminology" with language prefixes for non-English categories, but I didn't think that was our preferred convention.
I don't really understand *topical* categories as such (although again I don't propose to do away with them). Words have senses and usage characteristics, but they don't have topics. There is no real limit to the words one can use to discuss the topic of astronomy; on the other hand, the words or senses one can use to name astronomical objects, or which are largely unique to learned astronomical discourse, are relatively limited, and therefore suitable as a basis for categorization. -- Visviva 04:24, 17 September 2007 (UTC)[reply]
But words have senses that are only used in a certain topical context. If I say Jupiter in a conversation about astonomy, I probably mean the planet. If I say Jupiter is a conversation about Ancient Greece, I probably mean the deity. Additionally, topical categories serve the purpose of allowing users to find words they don't know on the basis of their context. So, if I'm looking for a particular astronomical term that I heard once but can't remember, I can look through a list at Category:Astronomy. These categories also provide a useful reference for specialists looking to learn technical jargon for a particular field in another language, such as a doctor who plans to work in a relief hospital in another country and who wants to learn basic vocabulary in order to communicate with the patients. --EncycloPetey 15:37, 17 September 2007 (UTC)[reply]

Apart from the flap about English being a default, I think this is a good proposal. It gets at the heart of an issue that has been left open to interpretation without answering any questions solidly. When a category is not a context, such as time, it is clear that it is not appropriate as a context label for any term. When a category is largely unheard of, such as combinatorics, a context label on the definition line is probably appropriate for every term in the category. The real question is what a context label means. Was Marcus Manilius talking about the gods or the planets when he looked up at the heavens? Is the the cow that jumped over the man in the moon astronomical or astrological? Do people who believe in Martians have a different definition for "Mars" than astronomers, or are they also doing astronomy?

To say that there is a scientific meaning for a term does not mean that the term has to be used in that or any specific context. I would go so far as to say that, for linguistic purposes, the accepted scientific meaning is more often incorrect. Since its inception, further back than we can even trace the roots of our own language, a second has been measured, whether precisely or not, as one part in 60 of a minute, which is one part in 60 of an hour. For the last two thousand years, an hour has been considered one part in 24 of the period of a day, which as the most easily quantized astronomical event has existed in concept since prehistoric times. The scientific definition of a second, on the other hand, has changed four times in the last century alone. So what did Leybourn and Morden mean by "The Length of a Pendulum for Seconds" in 1702? A short, indeterminate amount of time? The duration of 9,192,631,770 periods of radiation corresponding to the transition between two hyperfine levels of caesium-133?

The distinction made in this proposal between a category as topical, such as ballparks for baseball, and technical, such as RBI's, is one aspect of resolving this issue. These destinctions can be very wide or they can be very narrow. Preschoolers learn what a triangle is, but the geometrical shape has a very precise mathematical definition that applies equally well to hyperbolic space. Are they fundamentally different concepts? I would say yes since the distinction is made at circle. On the other hand, imaginary numbers have only a single idiomatic meaning. But are precalculus students doing complex analysis? For that matter, are physicists? We should really look at how we topically categorize terms in the first place, and decide if and how the split for jargon would be necessary. DAVilla 17:48, 17 September 2007 (UTC)[reply]

The page Wiktionary:Criteria for inclusion currently claims — erroneously — that we exclude all place names that aren't used attributively. I've created a proposed vote at Wiktionary:Votes/pl-2007-09/Placenames stopgap, which would correct that claim. (It's a "stopgap" in that it doesn't replace this claim with a precise description of our criteria for place names, because we don't yet have such a description, and is therefore only a temporary measure to keep Wiktionary:Criteria for inclusion accurate-if-incomplete.) The vote hasn't started yet; please take a look, and let me know if there's anything you object to. —RuakhTALK 06:23, 16 September 2007 (UTC)[reply]

That is an erroneous assumption. You think WT:CFI is wrong. Yet your example France obviously is used attributively. (French fries, anyone?) I very strongly object to someone pushing their POV by starting a vote out of the blue, completely removed from reality. That guideline currently is fairly well understood (with the exception perhaps of "attributively" - which you don't address) and is actually followed, by and large. I cannot AGF about someone's motives for pushing "notability" criteria, with absolutely no previous discussion. The previous similar attempts at this type of "Wikipedia notability" nonsense have died for numerous tangible reasons. Please see What Wiktionary is not and read it - it is perhaps the oldest of our "Guideline" pages. --Connel MacKenzie 08:10, 16 September 2007 (UTC)[reply]
Come off it. There is absolutely no reason to assume anything but good faith here. This is not out of the blue. Everyone knows we've discussed this issue at length, and Ruakh has even set the tentative open date a week out to give us more time to discuss and refine the stopgap vote. (Besides, the word "France" is not in "french fries".) Rod (A. Smith) 09:21, 16 September 2007 (UTC)[reply]
That is so false, it isn't funny. Proposed votes are supposed to be reflections or solidifications of what the community currently agrees to; not violent vehicles of imposing one POV in direct conflict with existing practice. Seeing the success his friend had with sneaking past an another invalid vote, he decided to bypass normal process intentionally here. Starting the vote on a new approach to a controversial topic with no previous discussion? Sorry, no. There is no possible way to assume good faith. Starting this vote was simply a malicious, disruptive act. The fact that I previously changed the default to "one week" for newly created votes says nothing in his defense. --Connel MacKenzie 15:24, 16 September 2007 (UTC)[reply]
What? O.K., first of all, I know you remember that in past discussions I was quite willing for these non-CFI-meeting entries to simply be moved to appendices; you objected. (By which I mean, many people objected, and you were one of them.) So obviously it's not my POV that CFI need to be extended in this regard; I'm simply trying to codify a POV that seems to have consensus and that has already been applied in spite of the CFI as written. (The funny thing is, back then you accused me of POINT-pushing for supporting these entries' deletion; now you accuse me of POV-pushing for trying to fix CFI to allow these entries. Which is it? I guess to you it's more important to be accusing me of something, then to have any basis for the accusation?) Regarding your last sentence: that doesn't make sense. I wasn't sure how long to wait before starting to vote, saw that the default was six days, and decided that worked fine. If I were trying to push this vote through quickly, I could have simply started it immediately; it's not like you cast some sort of magical enchantment that would have prevented me from bypassing your default had I wanted to. —RuakhTALK 15:44, 16 September 2007 (UTC)[reply]
My "accusation" is that you are starting a vote with no discussion. Of course the general topic has been discussed at length; no acceptable modification has yet been found! That is, perhaps, the only thing that is clear from all the discussions. So you, instigating a vote for yet-another-wording of an already refuted concept isn't worthy of criticism? That does not follow. Not sure what the magical barb is all about; Rod was giving you enormously undue credit for the one-week {{premature}} phase...that was all I was refuting. But it seems you wish to take credit for that, too, anyway? Sheesh. --Connel MacKenzie 16:23, 16 September 2007 (UTC)[reply]
Ah, I see. I think you and I feel roughly the same way: we need some discussion before we start the vote, in order to make sure that the vote does in fact reflect a consensus (rather than simply splitting people into sides and determining which side is larger). However, whereas I feel that it makes sense to create exact text for the vote, so we have something concrete to discuss, you feel that everything I do is automatically wrong, and that I do it in bad faith. —RuakhTALK 16:36, 16 September 2007 (UTC)[reply]
And all the previous iterations for proper noun proposals, where such text was supplied here on WT:BP? For highly controversial votes, the only time that stunt was pulled before, was when your friend did it for possessives. (No, wait, also for "brand-names 2" - the other obviously invalid proposal.) While I admit my opinion of your personal credibility has taken an astounding nose-dive, I was critiquing your actions for this vote only; you did it wrong, and you know better. That's not an assumption of bad faith, it is a clear observation of a fundamental fact. There was no reason for you to be so deceptive. --Connel MacKenzie 18:47, 16 September 2007 (UTC)[reply]
I didn't think this was a controversial vote at all, much less "highly controversial"; my intent was for it to reflect what people already seem to agree on, leaving for later discussion things that people don't agree on. There's nothing deceptive here; you're just being either insane or malicious (I'm not sure which). Since we're being so frank, my opinion of you changes significantly over time. Sometimes you seem to be a perfectly sane human being with the desire to make Wiktionary better; other times it's fairly obvious that you're a troll with the sole goal of pissing off people who want to actually contribute here. (For the past week or so you've been in the latter category, and I'm kind of just waiting till you become sane again.) —RuakhTALK 21:02, 16 September 2007 (UTC)[reply]
How could a vote about a controversial topic not be controversial? Look at your actions, from the perspective of any person other than yourself and it is clear that you intentionally sidestepped the discussion, just to push the specific change you wanted. That, good sir, is either deceptive or malicious. I suppose I shouldn't rule out other possibilities, like insanity on your part. That would explain your recent actions with the bizarre (undiscussed, out-of-the-blue) headings in the main namespace. Or the bot stuff. Or the template stuff. Taken collectively, it seems to me that you are the one acting in an unbalanced manner. I cannot imagine any way that your could rationalize any of those, let alone all of them in succession. Are you enjoying trading barbs? I've re-written this three times now, to slow down the obvious escalation of name-calling. But you seem to be on a defensive, manic swing? Is it just a love of wikidrama? --Connel MacKenzie 06:52, 17 September 2007 (UTC)[reply]
Connel, have you noticed that you are the only person who assumes that there was deception or malice here? Rod (A. Smith) 07:20, 17 September 2007 (UTC)[reply]
No, but I do see numerous others expressing similar concerns more diplomatically. --Connel MacKenzie 08:15, 17 September 2007 (UTC)[reply]
Actually I don't see the need for a stopgap if we can just come up with some reasonable criteria and put it to a vote. Connel, why don't you open a vote on that proposal you made before on celestial objects? bd2412 T 19:08, 16 September 2007 (UTC)[reply]
Because it kept getting shot down. My reading of Ruakh's and Dmcdevit's concerns on that (very long) thread, was that there is no possible way to satisfy all (nor even enough of the community's concerns) to make it a feasible proposal. When all was said and done, it was still a proposal that would inflict encyclopedic notability for English entries here on en.wiktionary.org. I don't see how to rectify that. The complaint is quite genuine: we have a CFI that is based on a word's use in language. No matter how you look at it, creating exceptions for that, based on "notability", is unacceptable to some of the interested parties. And that is a serious shortcoming, that I've come to appreciate and agree with. --Connel MacKenzie 06:52, 17 September 2007 (UTC)[reply]
If someone does that in the near future, I'll withdraw my proposed vote. Until that happens, though, I have to assume that the months of stagnation indicate that we do not have consensus about what exactly the CFI should allow in the way of place names; but we do have consensus that the CFI are broken, because some place names are sufficient important (for some value of "important") that they warrant inclusion despite a lack of attributive use. If Connel votes against this just to be a dick, that's his right, but it doesn't seem that even he actually disagrees with the proposed change, and I don't see that it will cause the vote not to pass. —RuakhTALK 21:02, 16 September 2007 (UTC)[reply]
If I support your proposal, you'll stop calling me names? We can't have that - the Earth's orbit might be put in jeopardy. But thanks for wearing your heart on your sleeve.
Your proposal is to add "with exceptions being made for place names that are of particular importance." Yes, I object to that, for all the same reasons that the "celestial objects" proposal was shot down. That is, "particular importance" is not lexical importance. (Important to who, anyhow?) But more to the point, I'll vote against it on principle - it was started out of the blue, with no discussion. (Your assertion that any of the "place names" proposals is not controversial is inexplicable, given the many kilobytes you yourself have posted on the subtopics. They are all undeniably controversial.)
I'm not sure at all what you mean, when you say CFI is broken. When I say it is broken, I mean that #1) it allows for unedited (and non-spell checked) Usenet postings, #2) the one-year date range is far too small and #3) the three citations minimum is far too low. Sadly, I don't see a workable compromise; requiring twenty (20) or more book citations from reputable publishers spanning ten (10) years would put too great a burden on the volunteers that cite entries for WT:RFV already. And that would exclude legitimate, specialized jargon. --Connel MacKenzie 06:52, 17 September 2007 (UTC)[reply]
Your claim that "'particular importance' is not lexical importance" is deceptive, as I already made clear (below) was that I do intend for it to mean lexical importance, and I welcome any change to the proposed wording that would make that more clear.
Your claim that the vote was "started out of the blue, with no discussion" would be deceptive — the vote hasn't been started yet, period, and this right here? This is discussion — except that deception requires some sort of belief, or at least hope, that you might successfully deceive someone. No one here is stupid enough to buy into this claim, and you know it. Making obviously false statements that no one will believe doesn't make you deceptive; it makes you a troll.
When I say the CFI are broken, I mean a host of different things. One is the same as one of yours: that we can't increase the requisite number of citations, even though it's pretty clear that three is just too low, because currently the CFI are entirely dependent on people actually typing up each citation. Another is similar to one of yours: that they give equal weight to a Usenet posting as to an actual book, except in the special case that the book is a "well-known work" (which counts triple); I think it's clear that a citation from a book is much more valuable, and much more meaningful, than a citation from Usenet. And one is completely separate from yours: that the CFI as stated exclude France#English, even though we have consensus that the English proper noun France is important enough (for some value of "important") for it to be kept. (And, there are a bunch of other things besides. These are the three major ones, though.)
RuakhTALK 15:30, 17 September 2007 (UTC)[reply]
Sorry, but no. You created the vote (as you repeated in the very top of this section) indeed, with no prior discussion. Yes, that is "out of the blue." You've used this discussion as a vehicle for a slew of personal attacks. Your clarification below reverses the meaning of the proposal; if the wording of the proposal had said "place names, where the name itself is of particular importance..." it would be one thing, but it never did. Even that wording has far too many subjective holes though. You act as if there is no cause for complaint, when you start/create a vote, yet point-by-point acknowledge the validity of each critique. Nice. Glad to provided you with an outlet for more name calling. How anyone might think that behavior of yours is anything but trolling, is hard to guess. --Connel MacKenzie 16:37, 17 September 2007 (UTC)[reply]
I'm sorry, but you've failed to give any reason why it might be wrong to create a page for a proposed vote, link to it here, and let it be discussed and improved — and possibly canceled — before it starts. I gather that you think it's wrong, and that it deeply bothers you for some reason. Fair enough, I won't do it again; but I don't see how you can have expected me to know that. Certainly other editors have participated in this discussion, and as far as I can tell none of them minds that a page already exists with a proposed version of the vote. I started this discussion here, and it's you who launched into personal attacks and assumptions of bad faith. (I'll grant that I shouldn't have stooped to your level, though.) Yes, I'm proposing that precise-but-erroneous wording be replaced with accurate-but-subjective wording; I consider it more important that our policy pages be accurate than that they be objective, and it didn't occur to me that other people might feel differently. (That's my fault; I failed to see the obvious analogy of main-namespace pages, where you prefer a standard structure with inaccurate information over accurate information with a nonstandard structure. I'm sorry; your point of view on this is just so strange and foreign to me, that I have difficulty taking it into account. I'll try harder.) —RuakhTALK 17:01, 17 September 2007 (UTC)[reply]
Oh really? Let me see if I understand your position, then. You are saying that votes should be started with no prior discussion, using only the default one week "rewording" period to determine if those new votes (which some will say should never have been created in the first place) should be withdrawn? Furthermore, you think that sort of disruption (for controversial topics especially) should be encouraged? I could accomplish quite a bit of policy reformation, if I stooped to that level. But WT:VOTE would be quite overwhelmed. Your analogy to the main namespace is cute; but again, I don't see the point of having completely unusable useless data clogging up searches...GIGO. The fact that so many of the bizarre deviations cause secondary problems seems to mean nothing to you at all; that, I do think is more than just strange. Your continued misrepresentation ("standard structure with inaccurate information"? No.) shows that you still wish to troll here. Good going. One mistake of mine amidst a thousand corrections is license for you to harp on and on? Yet you enter batches of intentionally useless material in the wrong place, then grouse for weeks when I suggest it should be corrected? Yes, my opinion of you has now managed to drop even lower. --Connel MacKenzie 17:27, 17 September 2007 (UTC)[reply]
Re: "my opinion of you has now managed to drop even lower": I'm starting to think that's something I should be happy about. :-) —RuakhTALK 17:32, 17 September 2007 (UTC)[reply]
Well then, have a great day. Funny, that actually trying to respond to the rational-sounding points you make and actually pointing out where you went wrong, results in more snipes from you. What you did, starting the vote, was wrong. Attacking me because you made an error and I pointed it out, is understandable. You have my pity. --Connel MacKenzie 19:17, 17 September 2007 (UTC)[reply]
I don't see this as issue of bad faith either, and even if it were, I don't see where bringing that into the discussion would help. That's what the whole spirit of AGF is. No matter what our first instincts are, if we train ourselves to react as if we thought something was done in good faith, we're likely to get the more cooperative response, so it is almost never useful not to do so. Dmcdevit·t 10:19, 16 September 2007 (UTC)[reply]
I just noticed this after I had posted to the proposal's talk page. To be clear, so it doesn't look like I'm just being contrarian: I don't have any love for our current placenames CFI either; I think it is wrong, actually, and (*cue sound of distant universes popping out of existence*) I think we should be allowing certain non-attributive placenames. This is the wrong way to do it, though, and it is largely the same problem as all three other votes proposed so far, just in fewer words. It seems every time people try to solve the problem it comes out as a definition of notability, not word usage. Certainly that is not current practice as the proposal implies, though. I'm crossposting the posts from Wiktionary talk:Votes/pl-2007-09/Placenames stopgap below, if no one minds. Dmcdevit·t 10:19, 16 September 2007 (UTC)[reply]

Importance as a criterion

I really believe that importance (of a place or a person) is a good criterion for Wikipedia, but not here. The criterion was linguistic. I agree that it must be changed, but the new one should be linguistic too (e.g. Confucius, White House, Le Havre and France should be accepted because they are words (or can be considered as words), George Washington or Washington Street should not be accepted, because they cannot be considered as words). Of course, this should be refined, but you can see the idea. Lmaltier 07:13, 16 September 2007 (UTC)[reply]

I agree. Importance has nothing to do with it, and we should not be entering into debates on notability (although, note that all cities, towns, villages, etc. with any population on any census records are considered notable on Wikipedia. This suggests that importance can be defined rather broadly). The problem is that a word is a word if it has usage in it's particular language, not if the thing it represents is important. Is "frtyu" an important place name? It might qualify, since that's the word that I just coined for "France," a notable place. This is an absurd example, but I would say that adding many other "important" place names would be similar coinages. As I've said before, I believe that "Melekeok" is not a word that can be fairly said to have entered the English vocabulary. This does not mean that the capital of a sovereign nation is not an important place, but if in all of JSTOR, there is not a single reference to it without parenthetical explanatory context (i.e., what you would do when introducing a foreign word that is not translatable), I would say the word is not important. In terms of a dictionary, trying to construct a criterion for inclusion on the basis of the real-world importance of the concept a word refers to, and not the word itself, is necessarily arbitrary, and encyclopedic. Dmcdevit·t 08:00, 16 September 2007 (UTC)[reply]

Hmm. In a draft y'all didn't see, I originally wrote "particularly important place names", but then I realized that that could sound like "{{particularly important} place} names", i.e. names of particularly important places (an encyclopedic criterion, not a linguistic one); so, I rephrased in a way that I thought was unambiguous: "place names that are of particular importance", where I thought it was clear that the names were what had to be important. Judging from your comments, however, it seems my rephrasing was insufficient: it still sounds, or risks sounding, like it's talking about encyclopedic notability. Note: this paragraph was edited 17:01, 17 September 2007 (UTC) to fix a major typo; specifically, to replace "place names" with "places" in one spot.

I guess what I'm trying to say is, I completely agree, and that's what I was trying to say to begin with. Please help me out by proposing a phrasing that doesn't have this problem. :-)

RuakhTALK 15:25, 16 September 2007 (UTC)[reply]

I understand the urge to want to include "important" names (how could we not have France in a dictionary?) but I am also uneasy about getting there via a requirement that comes down to a form of notability. Having said that, I do not have a good alternative proposal. Explicitly allowng cetain classes of place names seemed like a good way around some of this issue but past that I hope someone else can see their way out of the thicket. I don't think I can support this proposal as it stands. ArielGlenn 15:38, 16 September 2007 (UTC)[reply]
"Particular lexical importance"? Or "place names which most members of the community, as of today, regard as lexically important"? The first would rule out encyclopedism (I think), the second would describe our actual practice. Ultimately our concept of "importance" can only be suitably defined through a test (or tests) for lexical importance, such as those that have been proposed for brand names, phrases, et al.... however, you are probably sensible to leave such tests out of this stopgap vote. -- Visviva 15:45, 16 September 2007 (UTC
Generally speaking, place names of particular lexical importance are names of important places, because they are used more often... But is importance important? All verbs are accepted, not only important ones. Even names of small places are interesting (if they are words), for their etymology, their pronunciation, their gentilic, etc. Also note that etymological dictionaries specialized in place names do exist. Lmaltier 19:52, 16 September 2007 (UTC)[reply]
If we weren't already swimming in namespaces, a "Gazetteer:" namespace would have much to recommend it. -- Visviva 03:52, 17 September 2007 (UTC)[reply]
Aren't gazeeteers encyclopedic? I was referring to books written by toponymists. Toponymy is an important part of onomastics, which is an important part of lexicology. Lmaltier 05:48, 18 September 2007 (UTC)[reply]
I've changed it to "place names that are particularly important words", which I think means the same thing as "place names of particular lexical importance" but without using the word "lexical", which sounds too technical to me. That said, if you want to change it to "place names of particular lexical importance" or "place names which most members of the community, as of today, regard as lexically important", or anything else though, go ahead; I'm by no means sold on any specific wording, and do not by any means consider myself to "own" that vote page. —RuakhTALK 17:13, 17 September 2007 (UTC)[reply]
Personally, I would remove important altogether, and change it to place names which most members of the community, as of today, consider as includable (not as names, but as words). Would not this wording be acceptable to everybody? Lmaltier 05:48, 18 September 2007 (UTC)[reply]
To be honest, I'm not a huge fan. I prefer your wording to our current inaccurate text, but I don't really like the "not as names, but as words" part, because I consider a name to be a kind of word (albeit a very special kind). Also, it seems strange to me for a policy page to use the phrase "as of today". If you don't think we should say "important" — and I'm starting to agree with that view, as it's becoming clear that for many people "important" means "having encyclopedic importance" — then I'd prefer something like "place names that appear to have entered the language". —RuakhTALK 06:45, 18 September 2007 (UTC)[reply]
Some names are words, some are not words (nor terms), this is my point. Confucius is a word, not George Washington, Champs-Elysées is a word, not avenue des Champs-Elysées, SNCF is an acceptable term, not Société Nationale des Chemins de fer Français. Of course, only terms actually used in a given language can be included for this language, this is the general rule. About as of today, I agree with you, but I just copied your second proposal. Lmaltier 16:38, 18 September 2007 (UTC)[reply]
The word important should not enter into the discussion, because it is far too subjective. Again, not all languages provide indicators such as spaces and upper case letters. Asian languages often lack such indicators. This is why it is more important that obscure people and place names be included. Here is an example from Romance of the Three Kingdoms/Chapter 2:
Note that the indicators (spaces and upper case letters) in the pinyin and English are not present in the original text. A person who is familiar with China would probably already know that Changsha is the capital of Hubei province. However, very few people would know that 區星 is the name of a person, and that should be read as Ōu (it is normally read as qū). -- A-cai 21:52, 18 September 2007 (UTC)[reply]

set similes

I see we already have mad as a hatter, but calling it just an "adjective" seems somehow missing something. There are loads of these – neat as ninepence, clean as a whistle, happy as Larry – and so on. Should we a have a category for them, and what are they called anyway? I can't think of anything better that "set similes", which sounds horrible. Widsith 07:30, 16 September 2007 (UTC)[reply]

Some possibilities: famous similes, infamous similes, very common similes, widely-known simile, {{idiomatic simile}}. I'm not sure it merits a separate category...the "catsect" (category intersection) tool should be able to show {{idiom}} & {{simile}} intersections. --Connel MacKenzie 07:42, 16 September 2007 (UTC)[reply]
It seems like it's obvious from the form of the expression that it comes from a simile, so I think just labeling it {{idiom}} should suffice. (And I suspect that neat as ninepence and happy as Larry also warrant {{UK}}; at least, this Midwesterner isn't familiar with them.) I do think it would be nice to have a Category:English similes; it doesn't seem that the "set" part needs to be included in the name, because it's a given that we only include set similes. —RuakhTALK 06:53, 18 September 2007 (UTC)[reply]

Script templates

Hi everyone. I've been really busy with other things lately that I've not noticed that script templates have been edited. I need someone to explain to me why the script templates have been redirected, altered, etc. For example, the URchar template now redirects to Arab template. I have not been able to find any conversation regarding this (except the little mention of this on the Grease pit last month). I understand that to many English speakers Arabic and Urdu might be the same thing - however they're not. So, someone please explain this. I've edited thousands of entries using these templates. It makes me mad that no one has contacted users that use these templates very frequently to discuss this with them. --Dijan 16:42, 16 September 2007 (UTC)[reply]

It was discussed (rather shortly) in the WT:GP, and DAVilla immediately went and started re-arranging them to match the ISO script name codes (e.g. {{Arab}} for Arabic script). However he did a number of things badly, in way too much of a hurry. For example replacing {URchar} with a redirect to {Arab}. Which is wrong, Nastaliq should be {{ur-Arab}}. (The standard IETF/XHTML tag.) I will fix it. Note that Urdu written in Devanagari would use {{Deva}}, unless a script variant is needed there? I don't think so. Robert Ullmann 17:08, 16 September 2007 (UTC)[reply]
Thank you Robert! Yes, {{ur-Arab}} would be more accurate. No, a script variant for Devanagari would not be necessary. {{Deva}} would be just fine. Also, this is not just about Arabic script, but also about Cyrillic and others. For example, present {{Cyrl}} template seems to be designed specifically for Russian - it uses fonts designed to support Russian Cyrillic. Anyway, in the end, all we've done is just renamed (standardized) the templates according to script name, correct? Thank you so much for replying! Makes life a lot easier.  :) I know I do not participate in many conversations here, but I would appreciate if people contact me or Stephen (who also uses them very frequently) when it comes to these templates. Thanks, again.  :) --Dijan 17:21, 16 September 2007 (UTC)[reply]
Thank you, Robert, for resolving that (and catching other mistakes like mismatched CJKV). I had specifically named Nasta`līq script after making the transition as not mapping 1-to-1, but didn't know how to follow through.
How many of these variants are we going to need? Don't a number of them map equivalently (ur, ps, fa)? I can't imagine how their use would be regulated. Couldn't we pass the language code to the script template and have it do something special if needed? For instance, if we always wrapped the language, then it would be possible to boldface Latin but italicize English. On the other hand, it doesn't take a script template to do that. DAVilla 18:59, 17 September 2007 (UTC)[reply]
The thing is that the specific template for the language (ur-Arab) can then add that to the HMTL: lang="ur" xml:lang="ur-Arab" (think I've got that right), and then a browser can do user specified font selection. We don't need a lot of these; Arabic and CJKV are the only serious cases. The set of tags used is maintained by IANA. (And with several previous systems, is very complicated ;-). We just need to worry about the cases we find we need. Robert Ullmann 13:08, 21 September 2007 (UTC)[reply]
Wouldn't we always want to wrap the HTML with this?
<span xml:lang="{{#if:{{{lang|}}}|{{{lang}}}-Script" lang="{{{lang}}}|Script}}">
DAVilla 18:35, 21 September 2007 (UTC)[reply]

Spanish grammar tags

Some healthy discussion about the appropriate grammar tags to use for Spanish entries has been taking place in various places, some on wiki and some elsewhere. In order to gather the appropriate input, let's continue the discussions at Wiktionary talk:About Spanish#Third-person verb form definitions and Wiktionary talk:About Spanish#Present participle grammar tags. Rod (A. Smith) 01:41, 19 September 2007 (UTC)[reply]

download wiktionary

Is it possible to download the english wiktionary database in a basic sql or txt format? — This unsigned comment was added by 87.65.86.44 (talk) at 00:34, 20 September 2007 (UTC).[reply]

Have a look at http://download.wikimedia.org/enwiktionary/. I doubt anything there qualifies as "basic", though. Mike Dillon 03:33, 20 September 2007 (UTC)[reply]
For various technical reasons I don't pretend to understand, the major database dumps are no longer available in SQL format. You have to download them as (compressed) XML and convert to SQL using the utility of your choice. -- Visviva 07:10, 20 September 2007 (UTC)[reply]
The main concern, as I understand it, is that deleted entries are not removed from the SQL database, only flagged as deleted (so they can be restored if needed.) AFAIK, the "deleted" items are only actually removed periodically, whenever the WMF cluster runs out of space. --Connel MacKenzie 07:27, 20 September 2007 (UTC)[reply]
Dont forget to mention the existence of Special:Export. Mutante 14:47, 23 September 2007 (UTC)[reply]

I just wanted bring awareness of the new audio player. Please continue discussion on the talk page for Template:audio. --Steinninn 01:03, 21 September 2007 (UTC)[reply]

Yes, if you follow the [file] link, the audio file can be played using the Java media player on commons. --Connel MacKenzie 06:11, 21 September 2007 (UTC)[reply]
I agree with Connel's objections--especially the ugly blocky button. Using such a player doesn't make sense for the tiny pronunciation files we have. --EncycloPetey 12:51, 25 September 2007 (UTC)[reply]

CPT?

Just checking...I seem to recall we agreed a long time ago that having an entry for each CPT code was acceptable. That is still the case, right? Assuming that is so, is CPT 13160 an acceptable format? --Connel MacKenzie 06:13, 21 September 2007 (UTC)[reply]

http://www.ama-assn.org/ama/pub/category/3657.html Seems to be a problem. --Connel MacKenzie 06:58, 21 September 2007 (UTC)[reply]
Copyright issues aside -- although those alone are probably sufficient to scotch any program of inclusion --, this doesn't strike me as something that plays to our strengths as an open dictionary. There's just not that much that can be said linguistically about a single numeric code. -- Visviva 07:24, 24 September 2007 (UTC)[reply]

Are standard templates too naked ?

The Wiktionary:English_entry_templates typically have just two headings listed, but Wiktionary:Entry_layout_explained#Additional_headings lists so much more in the way of information that might be included. If the idea of the template is to standardize the layout, content and structure, it would see to be more advantageous to have the standard template be all-inclusive. One would naturally instruct users to trim bits that were unneeded, but it might give cause for users to give greater thought to the content of new additions. I'm a newbie, so insight i'm missing would be most welcome. - Iggynelix 22:26, 21 September 2007 (UTC)[reply]

I've tried an all-inclusive headings version, but entries just aren't started that way. DAVilla 13:39, 23 September 2007 (UTC)[reply]
I've never really understood why these aren't set up as standard substable templates, so that I could just type in
{{subst:new en noun|etym=[[foo]] + [[bar]]|definition=A bar with Foovian characteristics}}
or some such thing, and generate the properly-formatted entry in one swoop... Such an approach would allow ParserFunctions to be used, so that the "Etymology" section (or whatever) would be generated only if the user specified a value for it. But there is probably a reason why this has been avoided... -- Visviva 07:54, 24 September 2007 (UTC)[reply]
You mean like {{new en noun bot}}? Etymology would be a nice addition there. --Connel MacKenzie 08:08, 24 September 2007 (UTC)[reply]
Hey, that's a handy template. Actually I was thinking of something further along that road, along the lines of User:Visviva/new en noun, using subst'ed ParserFunctions to add or remove sections. (I'm still having some issues with whitespace handling in that draft.) -- Visviva 08:56, 24 September 2007 (UTC)[reply]
Yes, all the "new" prefixed templates are supposed to have " bot" suffixed equivalents. If you figure out how to do equal signs within the templates...your thing might work. (The syntax would get pretty cumbersome with one #if: per line, but at least it might work.) --Connel MacKenzie 09:01, 24 September 2007 (UTC)[reply]

Noting lemma forms in WT:ELE

Conversation moved to Wiktionary talk:Translations/Noting lemma forms in WT:ELE for easy reference from here or Wiktionary talk:Translations. Please continue the conversation in either location. Rod (A. Smith) 22:25, 3 November 2007 (UTC)[reply]

Future of the dictionary

Hi guys, did you see this talk? Seems to me this is exactly where we are heading… H. (talk) 08:18, 25 September 2007 (UTC)[reply]

Yes, it's been posted before but from YouTube. --EncycloPetey 12:47, 25 September 2007 (UTC)[reply]
At Wiktionary talk:Main Page#Etmyology [sic] a while back. Excellent stuff...I should have thought of mentioning it here. --Connel MacKenzie 06:40, 26 September 2007 (UTC)[reply]

RFT entries

I think it's wonderful that we have these {'{rft}'} tags and that the entries show up on WT:TR but I feel that as long as something remains on the rft list: the reason for that status should probably remain in the TR, a link to the archived TR topic should be available, or at least some clear information about how to find the relevant discussions should appear on the entry's history or discussion pages. Once again, I admit I could be overlooking something. The specific entry I'm concerned with is Anglosphere & I raised this in WT:TR#Anglosphere. Thecurran 20:16, 25 September 2007 (UTC)[reply]

Indeed, once the discussion has closed/been archived the tag should be removed. I suspect that the problem is that TR discussions don't have the same character as deletion or verification discussions: they aren't "closed" but simply shuffled off into nospace after a decent interval, and many discussions don't pertain to a specific entry, so tag-removal isn't an obvious part of the archiving process. Cleaning up leftover tags should be a bottable task, though -- the bot could scan Special:Whatlinkshere/Template:rft and remove the template from those articles that don't have an incoming link from TR. -- Visviva 04:57, 26 September 2007 (UTC)[reply]
Well, {{rft}} is quite recent, compared to how long the tea room has been around. But yes, the tea room discussions pertain to single specific entries only about half the time. The ones that do, should be archived to the page's talk page. But, as Visviva points out, there isn't usually a "this is closed/answered/resolved" concept for the tea room. Perhaps any conversation inactive for a month or two should just move to the respective talk page (if there is one) or a tea room archive page, otherwise. It is now about 1/4 MB, so better archiving should start to be considered, for it. --Connel MacKenzie 06:49, 26 September 2007 (UTC)[reply]

Proposal to expand word associations

I don't understand why Wiktionary limits itself to information that can be found in many other dictionaries without taking advantage of the virtually unlimited room that a computerized database offers. As it stands, each entry presents brief and often uncomprehensive definitions, a smattering of examples to show usage, and a list of synonyms. Wikisaurus also takes the "list" approach, requiring users to click on each item to see its definition on a separate page.

I would like to see Wiktionary and Wikisaurus combined, so that I can find all information about a word in one place. But more important, I would like to see a more expansive approach to incorporating related words having a variety of relationships to the primary term, along with a phrase describing the relationship in each case. Entries and related words would be grouped logically according to their meaning. The result would be a readable text in which both poets and technicians could easily find and grasp meanings that revolve around the primary term. And more editors would be inclined to contribute if they could add to a freeform text rather than just the constricted, formal approach of the current version.

This approach would undoubtedly make for many lengthy entries, but why not? Imagine an entry for "ball" that cites not only the various types of ball but also the sports that use a ball and the various plays in which a ball figures ("address" in golf, "alley" in bowling, "at bat" in baseball), with a phrase defining the connection ("To address a golf ball is to take one's stance and adjust the club preparatory to hitting it").

Here is an example of such an entry, in which words based on "abject" are grouped according to their meaning rather than their part of speech. Each of the related words (italicized here) would contain a link to the article containing its full definition. Obviously, the format could be improved. Any believers? Fbarw 23:23, 25 September 2007 (UTC)[reply]

abject 1 a (1) [adj] : cast down in spirit; without spirit or pride; cringing, groveling, servile, subservient

abjective [adj] : tending to make abject
abjectly [adj] : in an abject manner
One that is servilely abject is fawning.
Slavish may connote abjectness. Subservient implies compliance and obedience, perhaps abject.
An abject servility or obsequiousness is servilism. 
To cower is to shrink away or cringe, usually in abject fear of something menacing. One may crawl by advancing abjectly. Grovel implies a crawling or wriggling close to the ground, as in abject fear.
An abject parasite or toady is a lickspittle. 
One may adulate someone by admiring or being devoted abjectly to that person.
Humble may suggest an abject attitude and demeanor. Modest is without any implication of abjectness.
Mister, used in direct address and not followed by the given name or surname of the man addressed, sometimes expresses abject deference (as of a beggar).

abject 1 a (2) [trans verb] obs : to cast out; reject

abject 1 a (3) [noun] : one cast off; outcast
abjection 1 [noun] : a casting out or off; rejection
abject 1 a (4) [trans verb] obs : to cast off

abjection 2 [noun] : the discharge or casting (as of the spores of certain fungi)

abject 1 b [adj] : unrelieved by any sign of independence, courage, or originality; showing utter resignation; hopeless, helpless, supine

Someone craven is characterized by abject defeatism. Someone puling is of an abject nature. Pusillanimous connotes abjectness. Recreant implies abject lack of resistance. Supine suggests lethargic abjectness. 
To lie down is to submit abjectly to defeat, disappointment or insult.
Superstition may be an irrational abject attitude of mind toward the supernatural, nature or God.
Something done poorly is done abjectly [arch].

abject 2 a [adj] : sunk to or existing in a low state or condition; underfoot

abjection 3 [noun] : a low or downcast state; degradation, humiliation 
 Dirt and ruin [arch] are abject states. 
abject 2 b [trans verb] obs : to cast down; abase

abjection 4 [noun] : the act of making abject; humbling

To reduce someone to abject poverty is to pauperize that person.
And how will people with limited English make use of this? What will happen when a single page contains information for several words spelled differently, and must cope with severla languages as well. The approach you've described looks more like a print dictionary than what we have here. Look at an entry such as parrot to see a well-filled out English entry. Look at ser to see how we accommodate the fact that the spelling "s-e-r" occurs in multiple languages. --EncycloPetey 06:48, 26 September 2007 (UTC)[reply]
Actually, it looks similar to one of the proposed Wikisaurus formats. As I recall, that style didn't have any opposition, but it also had no one willing to go to that great an effort, either. (The difference, was that the proposed/abandoned format covered only "Wikisaurus:abject" in the above example, but would have separated "Wikisaurus:abjection" onto a separate thesaurus page.) --Connel MacKenzie 07:02, 26 September 2007 (UTC)[reply]
This corresponds somewhat to #Future of the dictionary above, where the speaker talks about "clickiness" and the similarity between outdated print dictionaries and online dictionaries that have not built to their potential. It also relates to a previous request to highlight a particular definition on a page. This is useful information, but is there any way to reference it without having to duplicate it everywhere? Can we transclude {{abject&def=ae923fc42llc47bf}}? DAVilla 16:37, 27 September 2007 (UTC)[reply]
OmegaWiki uses WikiData to implement the relational structure that is required for an endeavor. Without WikiData or some similar extension, it's pretty difficult to support the complex transclusion structure that would be required under out-of-the-box MediaWiki. Rod (A. Smith) 23:27, 27 September 2007 (UTC)[reply]

Some replies to the above comments:

For EncycloPetey: People with limited English (and aren't we all limited?) will type "abject" in the search box and be directed to this page. However, even with a phrase showing how the entry term is linked to the associated word ("Slavish may connote abjectness"), the user may still want to click over to the "slavish" entry to see what other meanings it may have, as the entries for italicized words do not necessarily convey the full definition of those words. An element of this approach is that the description for each associated word is tailored to include only the part of its meaning that ties it to the main entry.

Another element is that words based on the entry term ("abjective", "abjectly", "abjection") appear on the same page with their full definition(s). If a secondary term relates to only one (or several) of the meanings of the primary term, it is placed in close proximity to that definiton ("abjection", meaning a casting out or off, follows "abject", to cast out). "Abjection" would not have a separate page, and a search for it would lead to this page. Although my example does not cover this, the full entries for all synonyms of "abject" ("cringing", "groveling", "servile", "subservient") could also be included on the same page, allowing the user to compare the usage of each. The idea is to provide a database in which the definitions and associations for each term are classified by their meaning, yet each term is accessed simply by typing it into a search box.

Connel MacKenzie correctly notes that this approach would require "great ... effort". You bet! I have completed an entry for "ability" (including "able", "capable" and "capacity", among others) that weighs in at 2.8 megabytes (without images). But what are a few thousand megabytes to the Wiki community? Yes. it would look a lot like a print dictionary, but is that a crime? People who like to read dictionaries would love this one.

DAVilla is concerned about duplication. Yes, there would be a lot of it, but not having to click about so much would make such a database much friendlier for users.

For A. Smith: With each of the associated-word descriptions hand-tailored for each main entry, there would be less room for transclusions. I'm afraid this approach relies a great deal on human intervention.

I have not yet worked out how multiple languages would be dealt with, but this doesn't seem like an insurmountable problem; the "ser" example cited by EncycloPetey should work well with my proposal.

EncycloPetey calls attention to the "parrot" entry in Wikisaurus. First, the layout of that page is neat and appealing, no doubt. My sample needs better formatting, since my original document in Word does not transfer well into the Wiki editing format. All of the non-definitional material, such as etymology and pronunciation, should of course be retained. But because my approach classifies by meaning rather than parts of speech, the noun and verb meanings of "parrot" relating to repetition would be together, separate from the "bird" meanings. Also, the adjective meaning "of, resembling, or of the nature of a parrot" would be included, followed by derived terms such as "parrot fever" and "parrotfish". Most importantly, all of the various (significant) types of parrot would be listed, organized according to their scientific classification, together with full definitions (in this case brief descriptions of appearance, habitat and idiosyncracies). Distinguishing features would also be included separately, such as "cere", a soft swollen mass, often feathered in parrots, through which the nostrils open at the base of the upper mandible. All of this might duplicate some information in Wikipedia, but the emphasis would be on individual words rather than essay-length discourses.

Thanks for your comments. Can anyone direct me to a project that might find this proposal useful?Fbarw 21:53, 3 October 2007 (UTC)[reply]

I can't answer your question, but it seems to me that your approach would benefit greatly from standard formats for long entries that hid information 'under' buttons so that an initial screen showed what people most commonly needed and showed the complete range of content for the entry. I don't know what combination of show-hide buttons (size?), lead, table of contents, and format rules would do the job, but there ought to be a way. DCDuring 14:49, 9 October 2007 (UTC)[reply]

Bot vote: Interwicket

Please see Wiktionary:Votes/bt-2007-09/User:Interwicket. A new 'bot that is much more efficient that the bot code designed for the 'pedias. Make sure sure you look at User:Interwicket/code if you are at all interested in how it works at present, understand it is a work in progress. (You might like to look at User:AutoFormat/code too, although not relevant to this.)

We haven't had a working iwiki bot since July, User:VolkovBot was going to run, but only ran on the 15th, and a few edits on the 20th. We need a better bot with less overhead; we'll see how it goes. Robert Ullmann 23:36, 25 September 2007 (UTC)[reply]

Important notes: (see WT:GP for more info) this is far bigger than anticipated; it turns out that bots running interwiki.py on Special:Newpages or whatever have missed something like 1/4 of a million iwikis. Once they miss it, it is not recovered. I am trying to run code to do some of them; but it takes a while: today (last 24 hours) I have had 3 threads running at/from "d", "m", and "s"; none has made it out of its start letter.

A couple of people have commented on the name, please do. Do note that because of the number of edits, renaming is not a good idea (large overhead, even though User:UllmannBot has been doing a lot); but creating a new name would be fine. Robert Ullmann 23:09, 30 September 2007 (UTC)[reply]

Wiktionary is not a usage guide

I'm surprised there are not more issues related more directly to the dictionary/grammar/style and usage guide distinctions in Wiktionary:What Wiktionary is not.

See this diff on enormity. Just like any proper dictionary, it is not our place to occult senses that exist, but happen to be disputed, sometimes hotly so. Although it is part of our duty to note where disputed usage exists (hence the recent creation of {{proscribed}}), we should not make factual statements about errors of meaning when semantic change is real and acknowledged (although likely denounced) by authorities.

All this to say, what do you think of adding a "Wiktionary is not a usage or style guide" entry to What Wiktionary is not?Circeus 18:54, 26 September 2007 (UTC)[reply]

Writing the text for that will be very difficult, since we do want information about usage here. When use of a word stamps the user with a particular regionalism, social class, level of profanity, or educational level, we would like the user to know about that. However, we don't exclude words just because they are vulgar, stigmatized as incorrect, or are likely to incite anger. --EncycloPetey 03:33, 27 September 2007 (UTC)[reply]
My point has to do with dispute over fluctuation in usage, especially definitions (mostly stuff like what is found on w:List of English words with disputed usage). We should not say "definition x is wrong", but rather "many usage writers strongly feel that usage x is semantically/grammatically wrong", or "usage x is the object of disputes amongst usage writers". And we should certainly not eliminates a disputed definition altogether, as was done in enormity. That would be was us Frenchies call an énormité. Hey, Petey! I didn't know you hanged around here too. Circeus 05:25, 27 September 2007 (UTC)[reply]
Indeed, that is the approach we've always taken. Actually, that's not entirely true; previously, disputed items were deleted. The tags and usage notes are an essential part of making Wiktionary live up to "all words in all languages" - without them, we'd be back to mass-deletions. --Connel MacKenzie 20:15, 15 October 2007 (UTC)[reply]
In other words, we must have a neutral point of view on our subject (language). Lmaltier 05:38, 27 September 2007 (UTC)[reply]
Actually, that an excellent way to put it. *Snickers* I should have thought about it myself. Circeus 05:49, 27 September 2007 (UTC)[reply]
There is a policy under discussion : Wiktionary:Neutral point of view. For those interested, there is also an adaptation of the general Wikimedia policy to the French wiktionary (an adaptation was needed, because this policy was written with Wikipedia in mind, but principles are sound) : fr:Wiktionnaire:Neutralité de point de vue. Lmaltier 20:46, 27 September 2007 (UTC)[reply]
Thanks for linking to fr:Wiktionnaire:Neutralité de point de vue; it's an interesting read, and IMHO significantly better than Wiktionary:Neutral point of view. That said, neither one addresses my biggest question about applying NPOV here, which is how we ought to handle cases where different POVs imply different structures for an article. If different sources disagree about whether two uses of a word are etymologically related, how do we handle that? Similarly, if different sources disagree about the appropriate POS for a word, how do we handle that? We don't have any mechanism for ambiguously structured entries. (Perhaps both Wiktionnaire's document and ours avoid this question because no one has a satisfactory answer yet. ;-) —RuakhTALK 22:17, 27 September 2007 (UTC)[reply]
In the first case, there seems to be a need for separate etymology sections. Separating what could have been grouped cannot be wrong, if there is a reason to do so. In the second case, why not including both, again with appropriate comments? If the divergence is only about the name, the most usual name should be chosen. Lmaltier 16:46, 28 September 2007 (UTC)[reply]
Traditionally, disputed senses have gone to WT:RFV (I see no major flaws with continuing that.) Questions about etymology or references have typically gone to WT:TR. (That too, still seems tenable.) For multiple etymology issues, it really hasn't been a particular point of contention yet. As User:Lmaltier indicates, splitting into multiple etymologies when justified (despite how distasteful it sometimes may be) has been the general tactic used to resolve disputes. I agree it seems likely to become an issue as Wiktionary grows. It might be wise to advocate public domain references over copyright-protected sources, in general. --Connel MacKenzie 20:15, 15 October 2007 (UTC)[reply]

Rendering Afroasiatic scripts

I know that I can't read Hebrew well with my current w:Firefox setup, because all the nequdot are displayed separately from their letters. This really hurts dagesh usage. This may be why I can't read the following properly. I want to know how to get a better view like I had in w:MSIE.

I just noticed that in both Allah and Hezbollah, I see each letter displayed separately but normally when I type الله (Allah) in Arabic Word Processors, the single Unicode glyph U+FDF2, ﷲ, is diplayed; even in bismi l-lāh. Now, has my browser got it wrong, my understanding got it wrong, or have the writers got it wrong?

— This unsigned comment was added by Thecurran (talkcontribs) at 19:01, 26 September 2007 (UTC).[reply]

In both Allah and Hezbollah the glyphs are displayed. However, they are a little bit modified here due to the Arabic fonts used by the {{Arab}} template. --Dijan 20:19, 26 September 2007 (UTC)[reply]
As for the Hebrew, it turned out that both Windows itself and the standard Windows fonts up to Windows 2000 all had a wrong implementation of Hebrew script. They depended on the opposite order of dagesh and nikud as was specified in the Unicode standard. When the MediaWiki software implemented Unicode normalization this caused all the Hebrew entered by users to work with broken Windows to be fixed but the side effect was that everybody still using a broken version of Windows now saw broken Hebrew text. Windows XP has fixed rendering software and fixed Hebrew fonts. As a bonus it also renders old Hebrew text that was not normalized correctly. No upgrade was made available for older versions of Windows.
The developers have discussed adding an option to reverse this part of Unicode normalization for users without Windows XP but with so much other stuff to do it's still waiting. — Hippietrail 23:13, 26 September 2007 (UTC)[reply]
I use Firefox and XP. Under "Control Panel" > "Date, Time, Language, and Regional Options" > "Regional and Language Options" > "Languages" > "Supplemental language support", if I have "Install files for complex script and right-to-left languages (including Thai)" unchecked, then I get what you describe, with nikud coming after the letter, as though there were a non-breaking space there. If I have it checked, then it works properly. I make no promises for you, but when I pointed this out in the relevant Bugzilla entry, other users commented that it solved the problem for them as well; you should try it. (Fair warning, though: you may, depending on your setup, need your XP installation CD in order to install these files. Also, the bug has been fixed in the trunk, so if you're keeping up with Firefox updates it shouldn't be terribly long before the problem fixes itself for you anyway.) —RuakhTALK 01:46, 27 September 2007 (UTC)[reply]

Quoting quotations.

Sometimes, as in all three quotations at who shot John, we obtain a quotation from a secondary source. In some cases, including the aforementioned, the secondary source has the entire quotation, in which case all is well; we identify the actual date and source of the quotation, add "quoted in", and identify where we got the quotation from. But in other cases, such as at Necronomicon, the secondary source has only part of the quotation, so the above approach is unsatisfactory: ideally, we'd like to include some of the context from the secondary source, and obviously we can't attribute that content to the original source. Nonetheless, the word itself was used by the original source, and the secondary source is only quoting it. I don't see a good way to handle this; does anyone have any thoughts? —RuakhTALK 19:27, 27 September 2007 (UTC)[reply]

For RFV purposes, you are suggesting it is OK to reference the quotation in another secondary source? I think that is a bad idea. But for inclusion in the actual entry, there is no way to justify reusing the same quotation, as one found in another secondary source. --Connel MacKenzie 21:14, 27 September 2007 (UTC)[reply]
Wait, did you visit the articles I linked to? It's not like I'm talking about taking quotations from other dictionaries; I really don't see what the problem is. (In particular, the phrase "reusing the same quotation" doesn't seem to apply.) —RuakhTALK 22:06, 27 September 2007 (UTC)[reply]
No, I hadn't. It was not at all clear you were talking about spoken quotations that have been transcribed and printed in a regular book. Generally, "secondary sources" can mean that, but is far enough outside the normal manner we use that term here on Wiktionary, that I was misled. It is also weird that you'd choose troublesome quotations, when so many better ones are easily available. FWIW, when I asked in April, the definition given didn't make sense. Being well before the current atmosphere on RFV, such questions used to be dealt with less formally...i.e. just the link to Google above, plus the rewrite, would probably have been sufficient. Anyhow, thank you for the citations. I see no need to encourage the "quoted in" style variant you used there. It would not be wise to prohibit it, though. --Connel MacKenzie 02:02, 28 September 2007 (UTC)[reply]
Usually I give priority to the newest and oldest relevant and vaguely-useful hits on b.g.c., and if they're not terribly great cites, then the third cite I add will be the clearest and most useful I can find. In some cases, however, I include one cite because I think it's the oldest, and then I find an older one (or I simply realize that my first wasn't as old as I thought — b.g.c. misdated it — but I've already typed it up and see no reason to throw it out), such that I end up choosing all three cites for their date or apparent date rather than for their relative usefulness. If you'd like to add additional, better cites, then by all means please do so. In particular, who shot John seems to have quite a range of meanings, and accordingly it would be nice to have a more representative sample of quotes. —RuakhTALK 02:31, 28 September 2007 (UTC)[reply]
I think the approach in who shot John is appropriate: identify the original speaker as closely as possible, and then identify the work in which they are quoted (although FTR I think that listing an "author" for a set of hearing transcripts is a bit misleading). -- Visviva 01:34, 28 September 2007 (UTC)[reply]
My two cents: It is certainly true that we should prefer the original source of a quote in most cases. I would rather cite a quote attributed to Shakespeare directly from the play or sonnet in question rather than through a secondary source. However, there are times when citing the secondary source is not only legitimate, but preferred. In cases where someone is quoting an oration, we cannot always rely on having an accurately spelled and punctuated original. The secondary source, while quoting someone, is primary in the sense that it presents a printed version. Newspaper articles are a case in point, where a journalist reports what someone said. It is certainly possible that what the person is alleged to have said was misquoted, but as a dictionary organized by spellings of words, we are interested in print citations. In sum, there are times when it is perfectly appropriate or even desirable to cite a quotation through a secondary source. As Ruakh has noted, sometimes the secondary source provides additional context not present in the original, and can result in a new shade or meaning. Such quotations are not rendered invalid by virtue of being secondary. --EncycloPetey 04:55, 28 September 2007 (UTC)[reply]
I hate having to cite these because I don't know any other approach to doing them. DAVilla 14:45, 5 October 2007 (UTC)[reply]

Place names: toward a functional compromise?

OK, so I'm sorting out the March 2007 RFDs, and I come to this pocket of place name articles. These are reasonably well-formatted entries which have gotten a lot of attention from various editors; it would be a shame to delete them outright. On the other hand, the current wording of WT:CFI unambiguously bars the vast majority of place names, and the only acknowledged exceptions to that wording involve "too-prominent-to-exclude" cases like France. Support for loosening these criteria is far from unanimous, and no actual revision to the CFI is currently in prospect. To complicate the situation, Appendix:Place names is quixotically structured to simply link to entries in the mainspace, meaning that it will always be either perversely incomplete or perversely filled with redlinks to entries that can never be created.

For today, I've been moving the entries in question to Appendix:Gazetteer, because of the structural incompatibility, but I think there is a better solution: Restructure Appendix:Place names and sub-appendices to point to subpages of Appendix:Place names as a matter of course. When an otherwise adequate placename entry is found to fail CFI, move it to Appendix:Place names/Foo and link appropriately. (So for example the entry for Abakan would be at Appendix:Place names/Abakan and linked from Appendix:Place names in Russia.) For placenames which currently meet CFI, create Appendix:Place names/Foo as a redirect. (That way, editors can be sure that if a place name has an entry somewhere, it can be reached through the appendix).

Basically, I'm not proposing any changes in what we currently exclude and include, just grasping for a solution that all parties can live with. -- Visviva 06:36, 28 September 2007 (UTC)[reply]

No objection from me. I've said before that appendices are a fine place for placenames [3], and others have argued for that as well. We can make them searchable, and prominently link them, and so on. The problem with that solution at the moment though, is that while it is already perfectly within policy and acceptable, the ambiguity caused by people that support no restrictions on placenames, or something similar, means that such within-policy actions like moving a placename to an appendix is bound to be controversial; as RfD nominations simply following our CFI have been in the past. Of course, I might be wrong, and if so, I'll be the first to help with the appendicizing of appropriate articles. Dmcdevit·t 07:08, 28 September 2007 (UTC)[reply]
I agree 110% with Dmcdevit's comment. I do not, however, agree with the proposal that "For placenames which currently meet CFI, [we] create Appendix:Place names/Foo as a redirect"; for Appendix:Place names/Foo to be useful, it would need to be able to include non-CFI-meeting senses of CFI-meeting placenames. —RuakhTALK 12:17, 28 September 2007 (UTC)[reply]
That's a good point... I guess a soft redirect of some kind would be the best solution (not sure exactly how to format it, though). -- Visviva 12:53, 28 September 2007 (UTC)[reply]
Sure. I think all that was meant by that comment was that we wouldn't omit CFI-meeting placenames from the Appendix articles, since then it might lead to them actually becoming less visible due to inconsistency in finding them. If redirects are conflicting with the non-CFI placenames, then we'd simply replace the redirect int he Appendix namespace with an article with all placenames. But if there are only CFI-meeting placenames, then the redirect might be the simple solution, or we could duplicate the content. But that's probably not the most crucial part of the proposal. Dmcdevit·t 12:57, 28 September 2007 (UTC)[reply]

Direction on form-of templates

Could those of you who have been testing different styles of form-of templates comment on them? Have we decided we want to prefix them with a language code, such as {{fr-conjugation}}, or is passing a language parameter like {{plural of|lang=fr}} still on the table? DAVilla 18:34, 29 September 2007 (UTC)[reply]

I'm still convinced that we need language-specific templates for most purposes. —RuakhTALK 19:09, 29 September 2007 (UTC)[reply]
To clarify: In the specific case of plural nouns, I think a non-language specific {{plural of}} is probably the way to go. Indeed, we can have a more general nominal-inflection template that should work for many languages' noun and adjective needs, taking a mandatory argument for the lemma and a mandatory lang(uage code), plus optional arguments g(ender — m/f/n/c/____), n(umber — s/p/dual/trial/___), and c(ase), and perhaps value (positive/comparative/superlative). Even here, some languages will need their own templates (such as Celtic languages, with their initial consonant mutations, and Semitic languages, with their noun states). But for verbs, prepositions, and so on, I don't think we can we can find an approach that will work for many different languages. I'm not opposed to trying, though. —RuakhTALK 19:57, 29 September 2007 (UTC)[reply]
I could be happy either way, though I prefer a more generic template that passes a language parameter, so that we don't have to keep creating new language-prefixed templates; I rather have fixed base templates and fixed lang parameters than lots of individual language-specific templates. --EncycloPetey 19:11, 29 September 2007 (UTC)[reply]
I've been thinking about how to do form-of for Japanese, and I've come to the conclusion that it would be difficult to fit Japanese morphology into the English/Indo-European mold. For instance, depending on how it's done you either have "貸して" the joining-with-other-verbs form of 貸す, or "貸し" the particular-stem-for-certain-endings form of 貸す. (There are particular names for these, which I can never keep straight.) It would be nice to have a unified template, but I don't think it's feasible. Cynewulf 20:25, 29 September 2007 (UTC)[reply]
I see no problem there. If both deserve an entry (I have no clue), then templates should be created for both these particular terms. This might be particular to Japanese, but that is irrelevant. H. (talk) 17:19, 1 October 2007 (UTC)[reply]
Err.. oh, you mean if both should have an entry. Of course Japanese conjunctives, past tenses, etc. deserve an entry just as much as English plurals. The reason I mention two forms is that the English language books I've read describe "kashite" as a verb-conjunctive form, but apparently in Japanese schools the verb form is "kashi" only, and the "te" is something unrelated that just happens to appear in all cases when joining with verbs. The question here is whether we should create language-specific templates {{ja-te form of verb}} or general ones "conjunctive form of". I'm showing that in the latter case, I'm not aware of any other language that would use the "general" template. Similar languages can share templates, but Japanese isn't very similar. Cynewulf 17:06, 2 October 2007 (UTC)[reply]
What would you see as going on the definition line, specifically? The choice is between {{ja-conjugation|言葉|form=te}} or some abbreviation of that on the one hand, and on the other something like {{te of|言葉|lang=ja|pos=v}} or whatever would be correct in this case, {{conj of|...|言葉|lang=ja}} if you prefer that (and if it is specific enough). I consider form-of templates as specific as {{ja-te form of verb|言葉}} to be a nightmare, and if you doubt that take a look at all the ones for Finnish. DAVilla 20:29, 2 October 2007 (UTC)[reply]
Hmm, yeah, having to update ten thousand templates would get old after the forty-seventh. Unifying all Japanese forms in ja-conjugation sounds nice. I don't think doing "{{te of}}" would be a good idea -- if like that, then pick an English name for it. I don't really have the perfect answers here. My main goals are maintainability for contributors and describing the form correctly and precisely to users. One thing that bothers me is using the general {{past of}} to describe past tenses, but using something like "ja-te form" for -te forms. So, I guess it's either find names for all the weird forms and create things like "conjuctive form of", or go with ja-conj (or ja-form, or ja-formdef, or some nice name) for all of Japanese. (For reference, you can get an idea of the forms by looking at 話す for verbs and 白い for adjectives) My personal feeling here is that we can make general-purpose templates like {{plural of}} that get reused over most languages, but significantly different languages would get their own single-language all-forms template. Cynewulf 21:03, 2 October 2007 (UTC)[reply]
Well, you asked what I envision going on the definition line -- I hacked together {{ja-form of}} and put it on 言った as a prototype. See Appendix:Japanese verb inflections to get an idea of how different Japanese is. There are also similar inflections for i-adjectives, and very irregular ones for the copula /です, and other irregular things like ます. Now could everybody please tell me what's wrong with this implementation? Don't worry, I won't start using it everywhere right away. Cynewulf 17:52, 7 October 2007 (UTC)[reply]
I like the way you have laid it out in Wiktionary:Form-of templates. Often enough, related languages have same categories. In that case it is not very useful to have different templates for them, a lang parameter suffices (see e.g. marche for French and Spanish). And if we have a rigid naming scheme as suggested there, there should be no problem. It might be necessary to decide on an order in the possible terms. I.e. 1) person 2) number 3) tense 4) mood 5) whatever else for verbs, but maybe for nouns other order etc.
Of course, it can happen that some template is exclusively used in one language, but that doesn’t necessitate us to prefix it with the language code. H. (talk) 17:19, 1 October 2007 (UTC)[reply]

Quotation template

See User talk:Connel MacKenzie#Format question and User talk:Doremítzwr#Quotation template. Conversation on Wiktionary talk:Quotations#Quotation template please. :) Best regards Rhanyeia 22:04, 29 September 2007 (UTC)[reply]

You've provided links to several places, but I'm still not sure what your question or comment is. --EncycloPetey 22:08, 29 September 2007 (UTC)[reply]
Now my question is, could you have a look at different sizes and colors of the quotation marks here please. :) Please comment them. Technicalities have been solved. Best regards Rhanyeia 20:31, 7 October 2007 (UTC)[reply]
They are blue at the moment, but it's still possible to compare the two colors on a test page. Best regards Rhanyeia 12:22, 8 October 2007 (UTC)[reply]

Transliterating Greek vowels

I have posted a couple of questions at Wiktionary talk:About Greek/Transliteration about how to transliterate Greek vowels. In summary, how should ώ be transliterated? Some options, ranging from easiest to type to least ambiguous, are o, ó, ō, and . Rod (A. Smith) 19:35, 30 September 2007 (UTC)[reply]

I would say clarity beats ease of typing, so ṓ would get my vote as the standard. However, I've always preferred to use ẃ in my own notes. -- Visviva 12:03, 1 October 2007 (UTC)[reply]
As a very minor issue, can the acute accent be made more central on the unitalicised characters? (–Or is this an irrelevant concern, being as all transcriptions will be italicised anyway?) † Raifʻhār Doremítzwr 12:24, 1 October 2007 (UTC)[reply]
Unicode defines the codepoint for ṓ, which it calls "Small letter o with macron and acute". Your browser, operating system, and installed fonts then determine how to render that character. In my environment, it looks pretty much like I'd expect it. In your environment, does the mark appear farther to the left or farther to the right than you'd prefer? Rod (A. Smith) 16:19, 1 October 2007 (UTC)[reply]
Much further to the left. Imagine if the left-hand sides of both diacritics were connected by a hinge, then you get an idea of how it looks at my end. † Raifʻhār Doremítzwr 20:49, 1 October 2007 (UTC)[reply]

Asian classifiers and measure words

Hi A-cai. I recently came home from Vietnam and I've been reading up a bit on the Vietnamese language and brought some of it here. I've noticed that the headings and categories etc for the various Asian languages using classifiers or measure words are not consistent. I'm on IRC #wiktionary if you are available to chat. — Hippietrail 11:52, 23 September 2007 (UTC)[reply]

I'm not sure that it is an issue that we have tackled in as a group. A while ago, I added a mw variable to Template:cmn-noun. You can check out the "What links here" from the template page, to see some examples of where I've used it in the past. Of course, I'm open to suggestions for modifications if you have an idea for how to make improvements. -- A-cai 12:00, 23 September 2007 (UTC)[reply]

Yes I noticed the measure word on some han nouns and thought it was a great idea. Sadly I don't know enough about Vietnamese to know which classifier to put with many nouns though.

What I was thinking about was Category:Classifiers which now contains some of these terms for Vietnamese and Thai and space for Khmer ones as well. But there are no corresponding categories for Chinese, Japanese, or Korean. Is it that the latter three languages use "measure words" and that those are not the same as "classifiers"? Or should we choose one term to use for POS headings and categories for all languages? If not it would be a good idea to set up Category:Measure words for those three languages to match the classifier categories. — Hippietrail 12:10, 23 September 2007 (UTC)[reply]

I was just taking a look at the Wikipedia articles (classifier and measure word). It seems as though a measure word is one type of classifier. BTW, 自行车 is an example of Template:cmn-noun with the mw variable. I'll have to read the two wikipedia articles more thoroughly. For now, I can say that the Mandarin term which describes the mw variable in 自行车 () is called 量词 (lit. "measure word") in Mandarin. -- A-cai 12:21, 23 September 2007 (UTC)[reply]
In Japanese we've been using Counter. They are also sometimes "count words". (And in English, "singulatives", like head, e.g. of cattle. ;-) We should settle on something in common. As you note, "classifier" is broader than "counter", so we may need both. Robert Ullmann 16:32, 24 September 2007 (UTC)[reply]
Based on how things are organized in Wikipedia, you were correct to use counter for Japanese (see: Japanese counter word). The word in Japanese is 助数詞, which literally means help with counting word. However, the Mandarin term is 量词, which means measure word (hence Chinese measure word). It seems as though the term counter word is a synonym for measure word, but I'm not sure if there are any subtle differences between the two (I can't think of any off top of my head). One difference (from Japanese counter word):
The problem is partially solved for the numbers from one to ten by using the traditional numbers (see below) which can be used to quantify some nouns by themselves. For example, "four apples" is ringo yonko (リンゴ四個) where ko () is the counter, but can also be expressed using the traditional numeral four as ringo yottsu (リンゴ四つ). These traditional numerals cannot be used to count all nouns however; some, including people and animals, require the proper counter.
In Mandarin, you have to use a measure word even for numbers below ten. -- A-cai 22:51, 24 September 2007 (UTC)[reply]
It sounds like we could use Classifier as the overarching POS section header, then place an in-line parenthetical clarification at the head of definitions, just as we use (interrogative) and (personal) under the header of Pronoun. Does that work? --EncycloPetey 15:10, 30 September 2007 (UTC)[reply]
Sounds reasonable to me. However, I agree with Hippietrail that we should probably raise the issue at Beer Parlour, so that others can weigh in. -- A-cai 22:02, 30 September 2007 (UTC)[reply]
Would one of you Asian-language experts like to bring the topic to the Beer parlour please? I think you'd do a better job than me. — Hippietrail 09:25, 1 October 2007 (UTC)[reply]
  • In Korean, these have traditionally been considered a form of noun (specifically "dependent" or "bound" nouns, 의존명사), rather than a distinct part of speech. But I assume this is not the case for all other languages, so a global Classifier heading seems reasonable. -- Visviva 11:55, 1 October 2007 (UTC)[reply]

Unifying the header as "classifier" or anything else is fine with me. "yottsu" and the other native Japanese numerals are just a weird exception; they act as counters without being counters themselves. Cynewulf 16:52, 2 October 2007 (UTC)[reply]

It sounds like we've reached a consensus so I wonder if some of you would like to create an entry or POS section for one or two each of Chinese, Japanese, and Korean classifiers and post the links to them here. I'll set up some blank categories. Please look at some words in the existing Thai and Vietnamese classifier categories for some existing examples. Please also feel free to comment on anything necessary to unify formats that will work across all languages.
Does anybody know if Burmese and Tibetan also have such word classes, or any other major languages from this part of the world. What about Tagalog or Indonesian or Balinese?
As for "yottsu" I think Korean and Vietnamese also have alternate Sino words for some numbers that are only used in certain situations. But I could be wrong (-: — Hippietrail 02:33, 3 October 2007 (UTC)[reply]
I would say we need a heading Classifier for Thai etc; but for Japanese etc it should be just Counter, which we already use, and we already have Category:Japanese counters, these are not the same thing. (English also has a singulative, but we treat it differently.) The Chinese languages should also use "Counter" (not "measure word" which is Chinese-English translationese, like "number one Chinese food" ;-). Robert Ullmann 02:43, 3 October 2007 (UTC)[reply]
With respect to the English measure word, it is unclear whether it came from Chinese or the Chinese term came from it. It could be the later, since the Wikipedia article for measure word seems fairly detailed, and touches on languages besides Chinese such as Russian and Bengali. -- A-cai 11:40, 3 October 2007 (UTC)[reply]
  • I did a lot of Googling last night and it seems that in the field of linguistics "classifier" is the general term used to cover these types of words in all the mentioned languages. There are some great reference books. Classifiers : a typology of noun categorization devices by Alexandra Y. Aikhenvald and The world atlas of language structures both stand out and portions can be read on Google Books. Now it should be pretty well known by now here on Wiktionary that the POS categories used in linguistics and those used by dictionaries are not the same. My personal view would be to follow the established terminology as used by each language that has a dictionary tradition. How do big bilingual English dictionaries of Chinese, Japanese, and Korean treat these words and what terms do they use? On Wiktionary we seem to stick to dictionary-style POS in headings in articles but we have many categories with much more of a linguistics perspective. This generally adds detail and is a good thing. Given this, it seems that even if we choose to go with diverse terms as POS headings we might still go with a single term or a hierarchy of terms in our category structure. — Hippietrail 21:04, 3 October 2007 (UTC)[reply]
I checked several Chinese/English dictionaries. The term classifier was used as one of the translations for 量词 in several of them →ISBN, →ISBN. These dictionaries also included other translations such as measure word and numerary adjunct. Unfortunately, most Chinese/English dictionaries use Chinese characters to indicate part of speech/category etc. For example, in The Pinyin Chinese-English Dictionary:
běn ... ⑩ <量> [用于书籍、簿册等]: 两~书 two books/这部电影有十二~。This is a twelve-reel film.
<量> indicates that is a 量词 (classifier; measure word). The stuff in the brackets ([用于书籍、簿册等]) is what we would call Usage notes here at Wiktionary. In English: "used for books, periodicals etc." The other two after the colon are example sentences. -- A-cai 22:33, 3 October 2007 (UTC)[reply]