Open main menu
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit

October 2008

Someone / somebody / one

Hi. Can we come to a consensus about phrase entries with someone('s) somebody('s) one('s) in the title? I keep finding duplicates such as twist someone's arm and twist somebody's arm. Perhaps there is already a consensus? If so, can someone/body place it in the "News for editors" (what a really useful idea, btw!) so we all know. Cheers. -- ALGRIF talk 14:55, 2 October 2008 (UTC)

No matter what standard we might set, new editors and anons are liable to use one of the other possibilities. To discourage the creation of redundant entries like this, when I create a term (usually an idiom) containing this word, I use "someone" (because it is one letter shorter than "somebody") but also create a redirect with the "somebody" variant (and another redirect with the "one" variant if it makes sense for that term). When I find an existing duplicate "someone"/"somebody" entry I usually combine them into the "someone" entry and change the "somebody" to a redirect. -- WikiPedant 15:17, 2 October 2008 (UTC)
Good question... Two thoughts: 1. I would distinguish "someone/somebody" from "one." "Twist one's arm" would suggest to me sentences like "I twisted my arm" or "He twisted his arm." "Twist someone's arm" would imply sentences like "He twisted my arm," as is correct for this idiom... 2. I prefer "someone" to "somebody," but in any case we should have a redirect from one to the other. I wonder if we could get a list of these titles? -- Visviva 15:20, 2 October 2008 (UTC)
We would benefit from a consensus. The redirect idea is obviously a good one. It just involves a little work to standardize and police, much of which would be facilitated by a list and templates or executed by a bot.
I agree with Visviva's distinction between "one" and "someone" and with his preference for "someone". The reflexive restriction on many phrases cannot be conveyed otherwise. For example, the addition of "own" in "twist one's own arm" produces an phrase that does not convey the right meaning. I don't think that these points had been well discussed previously, however, so we need to see if they are widely accepted. If we come to agreement, it needs to be memorialized at WT:CFI#Idiomatic phrases or somewhere in WT:ELE. DCDuring TALK 15:49, 2 October 2008 (UTC)
Long ago, Paul G, Muke and I think Dvortygirl used the Wiktionary-community standard of "someone's" for all idiomatic entry headings. Other forms are supposed to be hard or soft redirects to the main entry. --Connel MacKenzie 16:04, 13 November 2008 (UTC)

Renaming two topical categories

Or are they lexicon categories and should be treated differently? I am proposing to rename:

__meco 16:09, 2 October 2008 (UTC)

No. They are properly named. They are not topics. The equivalents for other languages would be (e.g.) Category:French words with negative connotations, and so on. Robert Ullmann 17:05, 2 October 2008 (UTC)
There's a problem in that its only parent category is in the topical hierarchy, i.e. Category:Emotions. What to do? __meco 18:11, 2 October 2008 (UTC)
There are some cases where a topical category resides within a grammatical category or vice versa. This is especially true when it comes to the numbers, which are classified grammatically, but also in the topical Mathematics categories. Such situations are rare, and should be avoided when possible, but I don't think we can avoid them altogether. --EncycloPetey 18:23, 2 October 2008 (UTC)
There isn't any reason to "avoid" them, the category structure is hierarchical, but isn't a tree. A category doesn't have to be "in" one "parent" cat. No reason at all why these can't be in the cat for topic "Emotions" (which is useful; also note that that cat is "en:Emotions" except that we don't use the code prefix for English), and also be in the English language cat (directly or indirectly), where they "belong". Robert Ullmann 18:32, 2 October 2008 (UTC)
Shouldn't there be a category linking these to the lexicon hierarchy? __meco 18:52, 3 October 2008 (UTC)

Non-Latin script in etymologies

I think it is a bad idea not to use the Romanized version of for instance Greek words (or Chinese). I tried to change this for pornography but was promptly reverted. If this is policy we need to change it. It is most unhelpful to the majority of users to show preference for the original script and forcing those who would be interested in seeing (and understanding) what the origins of a word are to open more pages just to be able to see what that word is. __meco 16:52, 2 October 2008 (UTC)

The template accepts tr=, which displays the Romanisation in addition to the native script. The problem with your edit was that you were hiding the native script. See the entry for for an example where both he Greek script and romanisation are used. --EncycloPetey 16:59, 2 October 2008 (UTC)
Very good. I added tr= items to the term template at pornography. That solves the problem as far as I'm concerned (unless somebody decides to remove them again). __meco 17:30, 2 October 2008 (UTC)
"original script"? WTF does that mean? That the word used to be written in Greek and is now written in Latin? No, we write words (in etymologies and elsewhere), as they are written. If someone wants some derived attribute, they can follow the link. Seeing it in the script it is written in is "seeing what the word is". Robert Ullmann 17:02, 2 October 2008 (UTC)
As to what the fuck I meant by "original script", that is the script which is primarily associated with the language the word is classified as. For a Greek word, that would be Greek script. I do not agree with your blunt conclusion of "how we do things". The reason I bring this up is because I think we should pay attention to our raison d'etre and our customers and their needs. It is much less useful for someone who can not decipher a foreign script to "see what the word is" than to be able to read the transliteration and perhaps experience some recognition. As for relegating anyone who is interested in experiencing recognition thusly to follow the link, I addressed this aspect expressly in my first post. I also perceive your tone to be not overly conducive to congenial dialogue. __meco 17:22, 2 October 2008 (UTC)
I see no reason why we're having this argument. Our policy is already quiet clear (and has been for some time). We should have both the native script and a romanization. No one is in disagreement here. -Atelaes λάλει ἐμοί 17:56, 2 October 2008 (UTC)

Using both {{wikipedia}} and {{pedialite}} in the same article is an overkill

I have been accused by EncycloPetey of removing a link to Wikipedia when I replaced the latter by the former (which is simply not true because the box contains the same link surrounded by different text). It may be debatable if this was a good action but ultimately it is a matter of taste (we seem to agree on that) and I was working on

anyway. EncycloPetey reasons on my talk page further that it is community consensus that both should be used for linking - and obviously it is his opinion that this means in effect that it is a perfectly good idea that both templates are to be used in the same article and for the same language. I disagree. Community, can we agree on that it is not a good idea to use both templates in the same article within the same language section, and that I did therefore not do anything harmful? -- Gauss 18:57, 2 October 2008 (UTC)

Replacing standard text links with a courtesy box does not mean that the link wasn't removed. There are many, many pages where the text links are necessary and where the pediabox is atill useful as a courtesy. Consider the entry for , where the box link is to the disambiguation page on Wikipedia and the text links are to the individual articles. This is not an uncommon phenomenon. The box is a visual courtesy to alert users quickly that WP articles exist, but it cannot replace the links to the separate relevant articles, or we would have multiple boxes, which has in the past been rejected as visual clutter. --EncycloPetey 19:03, 2 October 2008 (UTC)
The argument with would apply only if the section External links at , which is the subject of this incident, contained a pedialite link other than to the same disambiguation page w:Hawthorn. That section could, and maybe should, contain a direct link to w:Crataegus but it didn't and doesn't. And to the other matter: A link is a link, whether it is in a box or not, and I did not change the link in any way! (In hindsight, I should have been the dab= parameter in {{wikipedia}}. My bad. I can admit mistakes.) Constructive opinions from a third party? -- Gauss 19:18, 2 October 2008 (UTC)
Some editors, including myself, prefer {{pedialite}} (or the equivalent {{projectlink|pedia}}); some editors prefer {{wikipedia}}; having both seems like a decent compromise. A lot of readers won't notice the {{wikipedia}} box, due to banner blindness. I'm not sure if the same is true of {{pedialite}}. —RuakhTALK 22:29, 2 October 2008 (UTC)

Possible import from Wikisource

Hello everyone. I'm not sure if what's discussed at s:User talk:Giggy#Spanish - English Dictionary for Beginners is the sort of thing wanted here, but if it is (see also the page linked to in the header there), then please leave a note on my Wikisource talk page and I can copy across the content. Its author has agreed to release it under GFDL. Giggy 01:34, 3 October 2008 (UTC)

Any administrators seem to be able to import. Where is the item to be imported, please?--Jusjih 18:04, 28 December 2008 (UTC)
Somewhere in the Transwiki: namespace - it should be incorporated into entries as opposed to existing standalone, and that process takes some time. Some of the links to the original seem broken, where is the actual content? Conrad.Irwin 18:16, 28 December 2008 (UTC)

XML dumps, daily spins

As readers of WT:GP know, I've been working on dumps to replace the AWOL WMF XML dumps. It is going quite well, you can find them at with a new dump added daily. Each one is up to the minute with edits and deletes when run, except that as of now, there are ~70K entries and edits from the last 4 months missing; it is adding 30-40K a day, so in 2-3 days will be current. In any case, these are a large improvement on 13 June.

There are two purposes for these dumps; one to make it possible to run our internal reports and lists again, the other to make our content available to other people re-using it in various ways. Sites like Ninjawords haven't been able to get new content from us since June 13th. (I suspect they have started doing dynamic mirroring, which isn't allowed, but under the circumstances, what were they to do?)

The dumps presently include namespace 0 and templates (namespace 10); this is what is needed for our internal stuff. For other users, they were using the primary WMF dump, which is all even-numbered namespaces except 2 (User). (FYI: the odd-numbered namespaces are the corresponding Talk pages.) So to make my dumps more useful, I should probably add some others back in. (At the moment it drops them, who needs a 4-month old copy of Beer Parlour? ;-)

If and when WMF gets their dumps running again (any day now, but it has been any day now since June ...) we will still be able to run daily spins, and not have to wait weeks for each. I was thinking about like this:

namespace ns include comments
(main) 0 yes possibly excluding a few remaining oddities
User 2 no
Wiktionary 4 no not content
Image 6 no all of our images are from commons
Mediawiki 8 no not content
Template 10 yes possibly excluding "User ..."
Help 12 no not content
Category 14 yes possibly excluding "User ..."
Appendix 100 yes
Concordance 102 no not our content, used for finding new entries
Index 104 yes
Rhymes 106 yes
Transwiki 108 no will appear if moved to mainspace
Wikisaurus 110 no not for now
WT 112 no not content, internal shortcuts
Citations 114 yes

Technical discussion, gory details in WT:GP. Any thoughts? Do you use the dumps? Robert Ullmann 15:25, 4 October 2008 (UTC)

I do not run any processes that use the dumps directly. As I understand it, certain lists depend on the dumps to be current. Of those, the ones I use regularly are "Missing", "Not Counted", and the L3 header list. Most of the other lists I use are based on categorization, often by bots. Since the dumps started getting irregular (March '08?), I have not devoted any time to thinking up analyses and maintenance lists that seemed to depend on the dumps. What other lists are run from the dumps? I think that my use is solely dependent on NS:0.

The daily dumps are useful from my point of view only if those lists are updated from the daily dumps. DCDuring TALK 16:14, 4 October 2008 (UTC)

I use the dumps and will be sure to grab the "spin" in a couple of days once it's current. Thank you for all the work. Besides the basic pages XML, I also use the category links file. Any plans for that too? --Bequw¢τ 20:03, 4 October 2008 (UTC)

Slavonic / Proto-Slavic ?

There are separate categories:

AFAICT, Slavonic refers either to a language family (Category:Slavic languages), or to the protolanguage (Proto-Slavic language).

Should there be two separate categories? (I don’t understand the distinction, if one is intended.)

The language template {{Sla.}} generates Slavonic links, and, suspiciously:

…so I suspect that {{Sla.}} should be deprecated and replaced with {{proto}} Slavic links – does this analysis seem correct?

Nils von Barth (nbarth) (talk) 02:13, 5 October 2008 (UTC)

You are mostly correct. {{Sla.}} needs to be analyzed on an entry by entry basis. Most of the entries which use it should be replaced with {{proto}}, as it is a reference to the proto-lang. However, in some cases, the template is used to refer to the language family, and not the proto-lang. We don't have a solid policy on how derivations from language families should work just yet, and so those should probably be left as they are right now. The same problem exists with {{Ger.}}. -Atelaes λάλει ἐμοί 03:16, 6 October 2008 (UTC)
Terms Slavic and Slavonic are synonymous. {{Sla.}} is used for borrowings from unknown Slavic language (most prehistorical Slavicisms in Albanian, Baltic, Romanian and Hungarian are such, as it doesn't make much sense to speak of individual Slavic "languages" in the period of 6th-12th centuries). Proto-Slavic should be used exclusively for Slavic words that descend from Proto-Slavic reconstructions. Sometimes the Proto-forms are added when Slavic words are added in the etymologies of other families' words, so you get e.g. English word being "derived" from Proto-Slavic, but that practice is IMHO wrong and should not be encouraged (or at least practised with blank lang= in {{proto}}. --Ivan Štambuk 15:27, 6 October 2008 (UTC)

Request for Bot Flag Neskbot

  1. Neskaya
  2. I am requesting a bot flag for Neskbot
  3. The bot is using the pywikipediabot framework. A link to the specific script for it can be found User:Neskbot/uploadscript.
  4. The bot will be used to semi-automatedly add Hiligaynon entries, in batches of no more than 250 at a time. Each edit is manually reviewed before submission. This bot will also add Hiligaynon language sections to existing entries. The bot serves to perform the edits that I would otherwise be performing under my own account, and the bot flag will allow me to process significantly more edits at a time.

The vote for this can be found here.

Thank you. --Neskaya kanetsv 02:19, 5 October 2008 (UTC)

I'm confused, sorry. Are you studying Hiligaynon? Where are you finding these words? —RuakhTALK 02:27, 6 October 2008 (UTC)
Yes, I'm studying Hiligaynon. I began learning Hiligaynon from a classmate in early high school, and am continuing to learn from the caregiver who is taking care of my grandparents. I am getting the words from an copyright-free dictionary that said classmate's grandmother gave me (it had no publishing information, else it would be cited under references) that I used OCR software to produce a PDF of. If you have any other questions please let me know. --Neskaya kanetsv 19:54, 6 October 2008 (UTC)

Using {{l}} to links to words in head of phrase?

See: Template talk:l#Links to words in head of phrase?

The template {{l}} (which generates language links w/o italicizing the entry) is v. useful for linking to individual words or subphrases in the head of a (idiomatic) phrase, as in {{infl|xx|phrase|head=...}}.

It’s useful for this as one does not want to italicize the words – is this an “approved use”?

I’ve started a discussion at Template talk:l#Links to words in head of phrase? & wanted to flag it here so people could weigh in.

E.g., current version of de gustibus non est disputandum has as head:

{{infl|la|phrase|head={{l|la|de}} {{l|la|gustibus}} {{l|la|non}} {{l|la|est}} {{l|la|disputandum}}}}

…which generates correctly formatted links

Does this seem ok? A good idea, even?

Nils von Barth (nbarth) (talk) 14:25, 5 October 2008 (UTC)

Two issue: (1) There is already a language parameter included in {{term}}, so if the template doesn't already link to the correct language section when "head=" is included, shouldn't there be a way to fix the existing template function to do that? (2) That's problematic for Latin, because the head form should include macrons, which your example above doesn't. I think for Latin, it would actually be simpler to use explicit wikilinks. For languages that do not have optional diacriticals (the way that Latin, Arabic, and Hebrew do), your suggestion might be feasible. However, it won't work universally, so it might be better to look for a universal solution. --EncycloPetey 19:36, 5 October 2008 (UTC)
You can use additional unnamed parameter for {l} for forms with diacritics, just like {term}, that will be used for display and not for wikilinking. e.g. {{l|la|gustus|gustūs}} which will link to gustus but display gustūs. --Ivan Štambuk 15:16, 6 October 2008 (UTC)
Ivan, thanks for informing us of the optional parameter for {{l}}!
I’ve documented it there and I believe this addresses EncycloPetey’s concern (2).
EncycloPetey, I don’t follow your issue (1) – could you elaborate?
The issue I have is in linking the words in the head of a foreign language phrase:
{{infl}} doesn’t link words automatically, hence one must do so manually, and {{l}} seems the best way to do it; I don’t see the relevance of {{term}}.
Nils von Barth (nbarth) (talk) 23:14, 14 October 2008 (UTC)

Pronunciation of multi-word idioms

At put the pedal to the metal I have removed a pronunciation section that consisted entirely of an {{rfp}} template. My reasoning for doing this is that the pronunciation of this multi-word idiom is completely predictable from the component words (which are already wikilinked). As such I don't feel it is worth the time required to add the pronunciation to these entries, especially as there are many possible permutations of stress pattern, slurring of word boundaries, etc, for each accent.

Does anyone object to this? Thryduulf 17:06, 6 October 2008 (UTC)

I object. The pronunciation is not entirely predicatable for two reasons in this case: (1) The word has two pronunciations. A non-native speaker will not know which pronunciation is to be used in this phrase. (2) The placement of stress in the overall idiom is not predictable from the component words. Multi-word idioms sometimes move stress to new syllables, or emphasize certain components over others. This cannot be predicted from the components. --EncycloPetey 17:14, 6 October 2008 (UTC)
(after edit conflict) The stress pattern of idioms like this is not fixed, depending on the speaker and the context; likewise the pronunciation of words such as "the" in idioms such as this is context (and stress) dependant and not crucial.
In the case of words with differing pronunciations depending on the part of speech or meaning, these are predictable once the meaning of the idiom is known. Thryduulf 17:19, 6 October 2008 (UTC)
O boy did you pick the wrong example (:-) Yes, the pronunciation is usually predictable from the words, but in this case rather spectacularly not! It is often pronounced with either "pedal" as "petal" or "metal" as "medal", matching the consonantal phoneme one way or the other. (and concur with EP, idioms sometimes have particular stress patterns etc) So sometimes a pronunciation section is warranted, either with IPA and so forth, or notes. Robert Ullmann 17:30, 6 October 2008 (UTC)
I can't recall hearing this idiom with the non-standard pronunciation of either "pedal" or "metal", which is why I described it the way I did. Thryduulf 17:35, 6 October 2008 (UTC)
I would not normally request a pronunciation for a multi-word expression. I think that Stephen made the point that this one is not pronounced exactly as one would expect, unless one happened to be a linguist and possibly a specialized one at that. I am not sure that I understand WP's point which may extend beyond this case.
In my experience in the US this is pronounced almost always as "pedal to the medal" (56 bgc hits for this spelling!) or "petal to the metal" (72 bgc hits for this spelling!) (possibly something in between) and hardly ever "pedal to the metal" (700 bgc hits). (For the curious, the truly nonsensical "petal to the medal" gets 5 bgc hits, predictable from the "error" rates for the individual words.) That the pronunciation should leave strong traces in edited works is pretty good evidence I would think. If a non-native speaker is in the advanced stages of simulating the speech of a native speaker, this would help. Phonetic lookup would be a help if we had it for many not familiar with this who first encounter it in speech. DCDuring TALK 17:39, 6 October 2008 (UTC)
I'm confused by your comment. In my dialect, "pedal" and "petal" are homophones, as are "metal" and "medal": all four use the alveolar flap [ɾ]. I thought this was true for almost all forms of U.S. English. (In hyper-enunciated speech people will distinguish them based on spelling, but that's not really relevant to this idiom.) So including a US pronunciation that's strictly SOP, plus three UK pronunciations that are strictly SOP except that two of them each have one phoneme switched, is really not going to help the reader. Are we O.K. with a free-form, usage-note-style pronunciation section, something like, "This idiom is normally stressed on the nouns and . In dialects where and do not ordinarily rhyme, speakers will sometimes modify one or the other in order to make them rhyme."? —RuakhTALK 17:54, 6 October 2008 (UTC)
I'm not in the habit of listening consciously to such things. Treating my own pronunciation as a reflection of what I've heard, I think that I make a small distinction, but it's hard to tell when I'm being conscious of it. MWOnline and Cambridge Dict. of Amer Eng. show different pronunciations for "pedal" and "petal". DCDuring TALK 18:04, 6 October 2008 (UTC)
I, too, hear a distinct difference in the two words. ...But I also pronounce the "c" in distinct. That's what you get for learning your vocabulary by reading. Amina (sack36) 00:32, 7 October 2008 (UTC)
Doesn't everyone pronounce the c in distinct?—msh210 19:58, 7 October 2008 (UTC)
Actually, no, not everyone does. At the very least, when I'm speaking fast the c slips out of it. --Neskaya kanetsv 00:12, 8 October 2008 (UTC)
Re: MWOnline and Cambridge Dict. of Amer Eng.: Well, I said almost all forms. :-P   Also, I should clarify that flapping is a phonological phenomenon, so I'm not sure whether "pedal" and "petal" are actually phonemically distinct. [Disclaimer: I am not a linguist, and the remainder of this comment should be taken with a large grain of salt.] In an illiterate society, I think unconditioned phonological merging probably implies phonemic merging. (Flapping is conditioned, but in ordinary speech, the condition is always met in pedal/petal and metal/medal, so it's "almost unconditioned" for the purposes of this discussion.) In our society, our literacy shapes our phonemic awareness. It's not perfect — as you point out, people sometimes write "petal" when they mean "pedal" and so on — but it has an effect. So even if flapping were universal to all forms of English, I think dictionary pronunciations might distinguish "petal" and "pedal" at the phonemic level. But to apply that to "put the pedal to the metal", we'd have to somehow determine that speakers are using the "wrong" phoneme. Your spelling comparisons are evidence of this, but I don't think they're terribly compelling. More compelling would be evidence from speakers who don't practice flapping (either because they never do, or because they're really enunciating). Thryduulf is presumably one such, since I don't think flapping occurs in the UK, and he says that he hasn't heard the "wrong" phone; Robert Ullmann might be another, since he now lives in another region where flapping might not occur (no clue), and he says that he has. (But if he's since grown unused to flapping, then maybe he would mis-interpret a flapped speaker's pronunciation as "put the pedal to the medal"?) We don't have a large sample of editors from every region of the Earth, and AFAIK we don't have independent references to use for verification of something like this. I think that whatever claims we make in our entry, they should be very cautious. —RuakhTALK 21:55, 7 October 2008 (UTC)
Then, all things considered, perhaps we have to concede that #Pronunciation of multi-word idioms is beyond our capabilities at this time or not worthwhile. It is not as if we already have all the single-word pronunciations covered. DCDuring TALK 23:17, 7 October 2008 (UTC)
Seems to me that if someone actually requested pronunciation be added by use of {{rfp}} then it must not have obvious pronunciation, and the pronunciation is worth adding.—msh210 19:57, 7 October 2008 (UTC)
But like many such requests it's us talking to each other. I placed the request, because Stephen's comment about "flapping" had made me notice or believe that there was something funny about it. If the funny pronunciation is only in some parts of the US, as it seems, and our pronunciations are sourced from the UK (mostly Thryduulf lately), and print sources don't usually cover multi-word pronunciations, the current result seems to be the outcome. DCDuring TALK 20:55, 7 October 2008 (UTC)
Oik! Does that mean no pronunciation guide for "Don't you know"? Where I come from it's "DOAN cha no" and you'd get sniggered at for the correct pronunciation. Perhaps "put the pedal to the metal" wasn't the best example? Amina (sack36) 10:03, 9 October 2008 (UTC)
Nobody would delete a pronunciation provided, I think. Nobody can force a volunteer to provide a pronunciation. The only question, I suppose, is whether we want to permit the removal of request for pronunciation tags to neaten up the entries and the request lists. DCDuring TALK 11:51, 9 October 2008 (UTC)

This reminds me of a phenomenon I have observed in my own use of lists (requests, missing, etc). At some point a list becomes "clogged" with items that I personally can't or won't fix. The motivating factor of the "empty inbox" is vitiated. I'm thinking that in some cases it may be better for me to copy such a list to my own user space and delete items as they are resolved or determined to be beyond my capability or inclination. There might be a technical solution to eliminate the need for the personal userspace solution, but only worth requesting if the underlying problem is shared by many. DCDuring TALK 12:14, 9 October 2008 (UTC)

I've also had this problem (the remnants-I-can't-fix problem). I don't have a solution to offer, but would be willing to help implement a solution you came up with. —RuakhTALK 16:36, 11 October 2008 (UTC)


What is this? is this a typical application of translations these days? Just so I'm aware, you understand... I'm having a very hard time keeping up with all the template-policy and -practice initiatives. - Amgine/talk 17:16, 6 October 2008 (UTC)

It appears to be an experiment for transcluding translations when there is more than one spelling of a word. In this case, there is a hyphenated and non-hyphenated form. --EncycloPetey 17:18, 6 October 2008 (UTC)
Yes, I understood that... wouldn't "aternative spelling of..." be more suitable, less likely to create a bazillion new templates? How does this define which sense is being translated? etc. It seems likely to result in complexification without real benefit, but that's merely my personal opinion. - Amgine/talk 19:15, 6 October 2008 (UTC)
I agree. I was merely noting that is seemed to be an old experiment, since you were asked what this was. --EncycloPetey 19:22, 6 October 2008 (UTC)


I've started a vote on substituting {{es-verb}} in place of the exisitng {{es-verb-ar}}, {{es-verb-er}}, and {{es-verb-ir}}, so that we use a single template consistently for all Spanish verb lemmata.

A full description of the template's function with examples appears on the template's talk page. Discussion is located at Wiktionary talk:About Spanish#Template:es-verb. --EncycloPetey 00:11, 7 October 2008 (UTC)

Manchu script

Recently, while looking through several articles, I observed in the translation section Manchou translations in Latin script(exempli gratia: here or here). There is even an entry about a Manchu noun, again in Latin script - aniya. I was urged insistently to input entries in the native script (at least what regards Gothic) and I would like to ask: are there objections against creating a [[Category:Articles which need Manchu script]], so that knowledgeable editors be able to input the script where necessary and to widen here the Manchu entries (there is still no Manchu Wiktionary). The script in question is on the right.
I had thought until recently that the digitalisation is not yet possible, until I beheld at this Wikipedia entry the name of the language in digitalised form, but just like Gothic until recently I am unable to see anything but questionmarks (ᠮᠠᠨᠵᡠ ᡤᡳᠰᡠᠨ). If anyone sees something meaningful, then this proposal about delivering Manchu words in the proper script without inputing images should be accepted and the expression would be bound to become the first Manchu word n Manchu language here. If so, I am ineffably eager to know whether the digitalisation has preserved the original top-bottom writing or it is an adjustment to the prevailing in the digital world horizontal writing? Furthermore: has anyone any idea about whether the Manchu script is expected to be included in the Unicode just like Gothic? Bogorm 13:33, 8 October 2008 (UTC)

FYI, the characters are supported in Unicode 3.0, from 1999. They are supported by five fonts which come with the Mac, and the above text specimen is readable on my Mac. The script renders horizontally left-to-right in both Safari and Firefox on the Mac. It is also supported by the free w:Code2000 font, but it appears to me that the characters are rendered sideways in that font (e.g., so that the rows look correct if you turn your computer screen sideways). Michael Z. 2008-10-09 15:26 z

In Wikibooks

There is some information about the script, but despite the explanation of the script here, in the next lesson they proceed with the Latin script... If there is a digitalisation, one should apprise the contributors on Wikibooks about it so that it is written properly, right? Bogorm 13:44, 8 October 2008 (UTC)

Mongolian (Manchu)

The script is Unicode range U+1800 to U+18AF, script code is Mong. Characters specific to Manchu are included in this range. Mostly what you need is a font.

The coding just puts the characters in order; it is up to the rendering (e.g. browser) to display them vertically if desired.

So the answer is, yes, already there, everything coded. Find a font to download from somewhere ;-) Robert Ullmann 13:52, 8 October 2008 (UTC)

Mongolian sounds pretty strange, since Mongolian is written in Cyrillic script and Mongolian is not a Tungusic language like Manchu. Following your elucidation, I conclude that a category [[Category:Articles which need Manchu script]] similar to the categories for Gothic, Cyrillic and so forth is exigent and I am going to create it and put aniya there. Are there any objections? Bogorm 14:11, 8 October 2008 (UTC)
Mongolian is now usually written in Cyrillic. But it was written in a Uyghur script as is Manchu. (Why the script block is named "Mongolian" rather that "Uyghur" I don't know; probably just the more familiar name.) See w:Mongolian script for a larger explanation. Robert Ullmann 14:23, 8 October 2008 (UTC)
Well, but Mongolian and Turkic languages have nothing to do with Tunguso-Manchu languages, which are a completely separate family. The Manchu script descends from the Jurchen script, which according to Wikipedia descends from Khitani script. And Khitan people are by far different from Mongolians... Some speculators believe that the Tunguso-Manchu group of languages and the Japanese language were allegedly Altaic languages, but this goes over the top (not according to me, but to the huge contesting group of venerable linguists) and I find it highly dubitable, they three are just in a neighbourhood area. So - Mongolians and Turkic people are similar, Japanese and Tunguso-Manchu - something entirely different. Bogorm 14:36, 8 October 2008 (UTC)
Totally wrong- the current manchu script is descended from classical mongol script- its the manchu language thats descended from jurchen. —This unsigned comment was added by (talk) at 20:54, 29 December 2008 (UTC).

Digitalisation in templates

If the script can be written as in this template about the language on German Wiki, then we should also adhere to the proper script, should not we? Is there something visible for anyone? Bogorm 13:57, 8 October 2008 (UTC)

I can see the characters, but as the template itself notes, the lines are all backwards. This is a Right-to-Left language script, but the lines are displayed as Left-to-Right because the reversal that would display them correctly is not yet supported. --EncycloPetey 17:53, 8 October 2008 (UTC)
In any case, I absolutely fully support tagging such entries with a script request. Many of the entries currently tagged for various scripts will be waiting for some time, as we have no one who can handle the script, and some are tagged with a script which is not even Unicode supported yet. However, I see no problem as this creates a convenient worklist for when we have the skills/technology at our disposal. Please feel quite free to use {{rfscript|Mongolian}} on anything you see which should have Manchu script but doesn't. In general, this should be true for any and all situations where a native script is not present. -Atelaes λάλει ἐμοί 19:01, 8 October 2008 (UTC)
EP: the script isn't RTL: it is top-to-bottom vertical, with the lines then ordered left-to-right. (And yes, it/they had an RTL origin, then rotated to vertical; so if one was to "force" horizontal presentation, RTL makes more sense. But it should be vertical, and hence is not in the RTL part of Unicode/UCS.) Robert Ullmann 14:47, 9 October 2008 (UTC)

WT:CFI and "Well-known work"

At the moment a term can appear in Wiktionary after one occurrence in a "well-known work". The interpretation of this is very subjective, and I would like to propose that we remove this proviso from the CFI. With templates such as {{only in}}, there is no need to fear that removing an entry from the dictionary will remove it completely; so we could use our "Concordance" or "Appendix" namespaces to include terms from any published work - as indeed is already started, see Appendix:Harry Potter terms or Concordance:A Clockwork Orange. The main thing to be decided, if there is agreement on this being a sensible course of action, is on what format these non-main-namespace pages should take; however I don't intend to wrangle too much about that until we've decided this would be a good thing to try. Conrad.Irwin 14:28, 9 October 2008 (UTC)

While "well-known" may be technically subjective, in practice it works extremely well: we have not have any dispute I can recall about whether a given work is "well-known". There are a small number that clearly qualify, a vast number that of course do not, and little grey area. This part of CFI is not broken, and I thus object to any attempt to "fix it".
The issue was raised at Talk:bababadalgharaghtakamminarronnkonnbronntonnerronntuonnthunntrovarrhounawnskawntoohoohoordenenthurnuk in which a contributor referred to our policy as "ridiculous". This instant case in fact establishes the contrary: Finnegans Wake is unquestionably a "well-known work", not in any grey area, and thus reinforces the reasonableness and validity of our policy. Robert Ullmann 14:41, 9 October 2008 (UTC)
There is some disagreement over Harry Potter, I am not suggesting its removal purely because of subjectivity, but more because I feel it inappropriate to distinguish some authors from others. Conrad.Irwin 14:47, 9 October 2008 (UTC)
In my opinion, we should follow the appendix practice if:
  • A very large number of terms were coined and most have not fallen into common usage (like A Clockwork Orange)
  • or most of the terms do not have translations; they are not synonyms for existing concepts but rather names for inexistant concepts
Otherwise, the well-known work addition practice works fine. Teh Rote 14:51, 9 October 2008 (UTC)
We would miss no valuable principal namespace entry if we would eliminate the well-known work exception to our attestation criteria. Finnegan's Wake always struck me as a well-known title, not a well-known work. Pynchon, w:Nabakov, Burgess, and Tolkein also have the penchants for coinage and resurrection or rare words that are usually not taken up more widely. The distinction we make favoring them over George Lucas, Gene Roddenberry, J. K. Rowling, George Herbert, and Neal Stephenson is a throwback to a more elitist kind of reference work than a wiki-based work at WMF should be, I think. DCDuring TALK 15:02, 9 October 2008 (UTC)
I will highly object to changing this aspect of CFI. We exist to define "all words in all languages". I think we leave too much out, personally (place names, etc.); this would exclude far too many terms. sewnmouthsecret 21:05, 9 October 2008 (UTC)
I like that clause and have no desire to eliminate it, but I would certainly hear out a proposal to clarify or otherwise improve it. —RuakhTALK 16:41, 11 October 2008 (UTC)
I agree entirely with Ruakh. The point about words such as the above is that they could possibly be used in another work, or article, at any time. And when that happens, Wikt should be there to help the reader understand it. However, OTOH, I think that "well-known work" should come with a start date. For instance, Shakespeare goes without saying. Edward Lear has given us runcible spoon which was once a nonce word. beamish similarly has come to have normal usage. But if we get too recent, then we are faced with all the Harry Potter garbage (IMHO) and similar, just because a lot of people know the work. If in, say, 30 years time Harry Potter is still well-known, then it would fall into the same category as Lord of the Rings and The Hobbit; books that are only just out of the grey area now. -- ALGRIF talk 10:05, 13 October 2008 (UTC)
In practice the exemption of words from certain favored well-known works allows us to venerate words from antiquarian or obscurantist literary English by including a modest number of words with were coined in select literary works, but that were not taken up widely. It serves no other purpose, AFAICT. It does not enable us to anticipate the popularity of words from such sources. "[[Beamish]]" and "[[runcible spoon]]" are examples of words that would not be excluded by the elimination of the exemption. Nor would many of the Jabberwocky nonces. The various dead nonce words and bits of eye-dialect in Finnegan's Wake would be. An appendix for Finnegan's Wake nonces and the use of "only in" tags to direct English majors, graduate students, and other antiquarians to that appendix should be a perfectly adequate solution. It has been deemed good enough for many categories of live words that are hard to cite but with specialized usage such as Military Slang and, yes, Harry Potter.
If I had pursued the well-known work exemption for [[Brer Rabbit]], would that have been permitted? DCDuring TALK 10:47, 13 October 2008 (UTC)
Are you then advocating no entry until an appendix is built? Or normal dictionary entry moved to appendix when (if) one is built? -- ALGRIF talk 11:50, 13 October 2008 (UTC)
"Only in" with a redlink to the appendix to be created would be one way. We could have the citation entry, perhaps integrated with the only in and with the appendix in some way (vague hand-waving). We could let our contributors suggest appendices and have some page that listed the appendices and the redlinks to "wanted" Appendices. We could have templates to facilitate the creation of such appendices. It is just a question of making sure that all second-class citizens are treated the same, until we come up with third- and fourth-class citizenship status. DCDuring TALK 15:04, 13 October 2008 (UTC)

Suggestion re News for editors

I think the Wiktionary:News for editors page is a very good idea, but suspect that it is overkill for it to display at the top center of every page. Assuming that most WT visitors are, or soon will be, users looking things up rather than editors, wouldn't it be more appropriate to have this link display just on project pages and on the editing page? -- WikiPedant 20:36, 9 October 2008 (UTC)

Once it is on a user's watch list, it would be better if they didn't have to see it elsewhere at all. Why on editing? Perhaps it could be put on everyone's watch list by default, permitting each user to unwatch it at will. DCDuring TALK 21:09, 9 October 2008 (UTC)
Or just before the "log out" link? -- ALGRIF talk 13:07, 12 October 2008 (UTC)
Sorry to resurrect the topic, but I think the current implementation is horrible. Why not incorporate a system similar to Wikipedia's (which I also recently installed over at Wikiquote), which is to present a user-dismissable line on the watchlist? You can actually put the news items, rather than just a link, and it won't be in the way of the average reader, but will instead be targeted at actual project editors. EVula // talk // 23:58, 6 December 2008 (UTC)

SEO spam on Wiktionary:Main Page

Why is SEO spam on the beginning of Wiktionary:Main Page (see the source)? Can any admin remove it, please? Wiktionary does not need such kind of nasty cheating, besides it makes the page less usable for people with screen readers, text browser, stylesheets turned off etc. because of seing/hearing unnecessary garbage. Thanks.

Danny B. 19:25, 12 October 2008 (UTC)

I think the point of them originally was to make our <meta keywords=""> be useful (which is not cheating, it's actually good practise), but that seems to have been broken at some point by software updates. It'd be nice if we got but I see no reason not to remove the links. Conrad.Irwin 08:36, 13 October 2008 (UTC)

As Brion pointed out, meta keywords are obsolete. Annoying of users with unnecessary and unwanted garbage (aka spam) seems to be way so good reason to remove it. I have described many situations above and there are other reasons as well - eg. the unnecessary prolonging of the code - 2110 useless Bytes - bandwidth and time waste - think about users on cell phones or dial-up connections or with FUP. Etc. There are many good reasons to remove it, besides - as I mentioned above as well - it is very dark practice and can result in penalty from search engines. In sum: There is no reasonable reason to keep it there but there are many reasons to remove it.

Danny B. 01:15, 6 December 2008 (UTC)

Removed. 50 Xylophone Players talk 01:29, 6 December 2008 (UTC)
Do we have any way of knowing whether what we did ever did us any good (or bad)? DCDuring TALK 02:02, 6 December 2008 (UTC)

Are we able to set a meta description element in the HTML? This could eventually lead to an improvement of the description in Google's search results, which are currently not ideal: “26 Nov 2008 ... Welcome to the English language Wiktionary, a collaborative project to produce a free-content multilingual dictionary. ...” If so, can we also set a description for each entry? Michael Z. 2008-12-06 03:35 z

Sadly, thanks to a total lack of understanding we aren't able to do this - though the technical solution has been implemented. See . Conrad.Irwin 03:43, 6 December 2008 (UTC)

...and the result is that Wiktionary no longer comes up in the first page of Google listings when I type in words or "definition of [word]". So, our MW solution is apparently a "final solution" for Wiktionary. Too bad the guys don't understand the difference between an internet encyclopedia and an internet dictionary. --EncycloPetey 08:30, 6 December 2008 (UTC)

Well, Google does successfully parse Wiktionary entry format (and other language wiktionaries too with defl=<ISO code>) when using define: operator in the search box, so they've probably very much aware of Wiktionary as a source of dictionary definitions. Moreover, if you type e.g. definition of thought into Google, it will list as the very first search results "Web definitions for thought", and when you click on it you get Wiktionary definition lines for the searched lexeme. If I were a techy guy thinking on how to do "SEO optimisation" of Wiktionary, it would be on how to enable Google parse definition lines more properly, as inline examples sentences and similar stuff seem to screw it up.. --Ivan Štambuk 17:38, 6 December 2008 (UTC)
Out of curiosity, has anyone technically minded read [1]? -Atelaes λάλει ἐμοί 08:39, 6 December 2008 (UTC)
I would like to see the removed words back. The saving in the size of the main page amounts to 2%; the page with the words has 18,218 bytes, while it has 17,871 bytes without them[2]. It is not spam--sending of unsolicited messages; it is telling the search engines about the kind of content that Wiktionary actually provides. The talk about people with various kind of devices is irrelevant; the only relevant information is the figure of 2%. --Dan Polansky 13:50, 6 December 2008 (UTC)
The incoherence of the arguments made against this makes one wonder about unstated arguments. For example:
  1. Is wiktionary more valuable to WMF as a provider of content to other on-line dictionaries than as a direct provider to users?
  2. Does WMF not want more user traffic on the servers?
  3. Is Wiktionary traffic deemed less valuable than traffic for other WMF content?
  4. Does WMF in some way view itself as competing with search engines for the hearts and minds of users?
Is anyone unaware of any information at WMF that would shed light on the answers to these questions? Is there anything in the record of WMF decisions (to the extent they exist) that would allow one to make reasonable guesses about any of these questions? The null hypothesis is normal human fallibility. DCDuring TALK 14:05, 6 December 2008 (UTC)

This is blatant SEO spam and should be removed from the main page. Google recommends avoiding hidden text and links which “can cause your site to be perceived as untrustworthy since it presents information to search engines differently than to visitors,” including “Using CSS to hide text.”[3]

Furthermore, this is less open. A web page must be usable without the CSS—this is a Priority 1 accessibility checkpoint.[4] Putting a list of useless links at the top of the page is specifically punishing readers who rely on assistive technology, and may be an inconvenience to many users of alternative browsers, as in mobile phones, etc.

Finally, this is a lame attempt to use SEO techniques (read “scumbag spammers”) on Wiktionary. What the hell are we doing, trying to increase our profit margin? Nuke it. Michael Z. 2008-12-06 16:36 z

I have seen no evidence that plain-vanilla meta keywords are considered "spam" or undesirable by search engine operators. They may have been so abused that it is simpler for search engines to ignore them completely, rendering them ineffective. Does having them put us on some kind of blacklist? That we try to keep landing pages devoid of repetition of helpful descriptive information that appears on our home page while making it easier for search engines hardly seems reprehensible. We were hardly making our pages less usable or accessible with these specific keywords. We were simply trying to make it easier for people other than ourselves to find useful material. If a new search engine should wish to use the meta keywords for trustworthy sites, should we forget them and defer instead to some folks' interpretation of what the search engine gods want? Why does this make you so upset? DCDuring TALK 17:18, 6 December 2008 (UTC)
Mzajac is not referring to the meta keywords, but to the main topic of discussion: the CSS-hidden content on our main page. (See Wiktionary:Main Page?diff=5659462.) —RuakhTALK 17:45, 6 December 2008 (UTC)
Thanks, I hadn't known how to get back to the source after the change was made. But seeing the changes, I don't understand why these terms, which are accurate alternative descriptions of our site, should engender so much hostility. They might be useful to some new search engine and could be readily ignored. Perhaps the word "free" should be excluded if it is a red flag indicating an untrustworthy site. How could anything possibly be troublesome in substance or method? DCDuring TALK 18:09, 6 December 2008 (UTC)
Hidden text is untrustworthy, and makes search engines suspicious of our site—“text or links there solely for search engines rather than visitors.” In the link above, Google makes it pretty clear that this is considered spamming and not to do it. “If your site is perceived to contain hidden text and links that are deceptive in intent, your site may be removed from the Google index, and will not appear in search results pages.” There are no ifs about this. If you're in doubt, read Hidden text and links. I'm removing this from our home page immediately. Michael Z. 2008-12-06 19:57 z
On my third reading of the Google link, I still only see that they are concerned with deceptive text. If they were concerned with any css or any hidden text of any kind, they could ignore it or they could have a much shorter article at that link. Any unilateral action on this matter seems wholly inappropriate in the absence of consensus. DCDuring TALK 20:23, 6 December 2008 (UTC)
Yeah, it's pretty subtle.
  • “Hiding text or links in your content”
  • “Using CSS to hide text”
  • “text or links there solely for search engines rather than visitors”
  • “anything that's not easily viewable by visitors of your site.”
  • “any text or links there solely for search engines rather than visitors”
  • “If you do find hidden text or links on your site, either remove them or, if they are relevant for your site's visitors, make them easily viewable”
They don't say “they could ignore it”—you made that part up. They say “your site may be removed from the Google index, and will not appear in search results pages.” Seriously, man. Let's not do this. Michael Z. 2008-12-06 20:37 z
Unabridged and unedited beginning of Hidden text and links, but with emphasis added in bold:

<excerpt> Hiding text or links in your content can cause your site to be perceived as untrustworthy since it presents information to search engines differently than to visitors. Text (such as excessive keywords) can be hidden in several ways, including:

   * Using white text on a white background
   * Including text behind an image
   * Using CSS to hide text
   * Setting the font size to 0

Hidden links are links that are intended to be crawled by Googlebot, but are unreadable to humans because:

   * The link consists of hidden text (for example, the text color and background color are identical).
   * CSS has been used to make tiny hyperlinks, as little as one pixel high.
   * The link is hidden in a small character - for example, a hyphen in the middle of a paragraph.

If your site is perceived to contain hidden text and links that are deceptive in intent, your site may be removed from the Google index, </excerpt>

Only if one ignores the bolded words, can one interpret their advice as a prohibition. DCDuring TALK 21:02, 6 December 2008 (UTC)
Dude, they say not to have hidden text. Michael Z. 2008-12-06 21:35 z

In addition to Google and the W3C, Yahoo and Microsoft also say the same thing:

Yahoo: “What Yahoo! Considers Unwanted: . . . The use of text or links hidden from the user.”[5]

MSN Live Search: “The following techniques aren't appropriate . . . Using hidden text or links. Only use text and links that are visible to users.” [6](stupid page with hidden content, if you can believe it: click “Techniques that might prevent your website from appearing in Live Search results.”) Michael Z. 2008-12-06 22:15 z

Page title

A way to convey information to readers and search engines, and in search results, is the page title. It currently reads “example - Wiktionary”. It is set by a string at MediaWiki:Pagetitle. Anyone know if it's possible to add a conditional so that in the main namespace it shows something like “example: definition from the free dictionary – Wiktionary”? Michael Z. 2008-12-06 22:33 z

It then becomes a visible pathetic mess, and we will never get the correct code implemented. (What are you going to make the page title? "word - Definition (define) from Wiktionary (free dictionary), meaning the meaning of the word"? There are at least 5 or 6 words we need.) The bug just needs to be re-opened or duped, and explained over and over and over again until it sinks in that this has utterly nothing to do with "SEO" and everything to do with just matching searches as all.
Brion is just completely over-loaded, has no time to accept the simplest solutions to severe problems (this refers to DCDuring's questions), and thus we see things like XML dumps not working for 8 months, when the temporary fix was trivial, but no-one had time to spend 15 minutes doing it so we could at least have the current dumps for all that time. (From what I can tell, there aren't any real backups either; if something systemic causes, say, the deletion of a whole bunch of images, they are gone. There is a reason I run daily XML dumps: that way I know there is a backup of my own work ...)
Back to topic, the fix is known, properly coded, and simply needs to be installed. What that takes politically we shall see. WMF really doesn't care about the wikts, and the pedia is overwhelmed with things that have nothing to do with content. (3AM rant ends here) Robert Ullmann 23:30, 6 December 2008 (UTC)
I wonder if we just aren't doing a good enough job of serving anonymous users.
I note that today's WotD, hypotrochoid, does appear on Google Search for that word until 52. Two other sites that mention the word as being wiktionary's WotD appear ahead of us. WP (#1) and Commons appear ahead of us as do a few online dictionaries. WP has meta keywords. MWOnline has the following line in the source for the landing page for the term:

<meta name="Keywords" content="hypotrochoid, definition, define, meaning, dictionary, glossary, free, online, english, language, word, words, webster, websters, merriam-webster" />

Having such keywords seems neither to help them unduly nor to disqualify them. DCDuring TALK 00:11, 7 December 2008 (UTC)
it will help cause the entry to match the search at all; as you note, it will not change page rank, that isn't what we are trying to do Robert Ullmann 00:41, 7 December 2008 (UTC)
More (not ranting) what Michael is suggesting that we do in the page title we already have: Msh210 set up Mediawiki:Tagline, (which see), which is picked up by Google. It is set display:none by default (in Monobook) which is why you don't see it under the L1 header as in the 'pedia. As noted, Google does pick it up and does use it in the "snippit" sometimes. (search "wiktionary definition of hypotrochoid", without quotes, we aren't looking for that as a phrase) But note if you search "wiktionary define hypotrochoid" you won't see it: it won't match the search term "define". We could add that to the tagline, and hide the tagline from all skins? But in any case, what Michael suggests in this section is already done. Nothing to fix, don't need to mung the page title. (mimi ninataka kulala ...) Robert Ullmann 00:41, 7 December 2008 (UTC)
Unfortunately, the hidden tagline is another example of what every search engine and accessibility guideline recommends against. Why are we hiding text on the page? If this is not desirable for sighted readers, then why subject readers who are already having to struggle with assistive technology to it? Michael Z. 2008-12-07 00:57 z

I'm proposing improving the page title. Whether we can set the meta keywords or not, it would still be an improvement to have the title actually say what the page is—this would improve its usability in search results, in your browser's history menu. And it would actually have an affect on search engine ranking, unlike meta keywords and hidden text on the page. Michael Z. 2008-12-07 00:57 z

Adding repetitive visible boilerplate text is undesirable to the extent that it takes up space which could be used to put more above-the-fold content on the landing page. We should want to increase the odds of actually delivering on the first screen either the content the user wants or a clear one-click link to that content. I suppose that 6-point type of a description of Wiktionary on each landing page wouldn't be terrible, but I cannot see its value to the user. Non-deceptive text that users don't see and don't want or need to see but which increases the odds of getting a match to a user's search for the answer to a reference question can't be a bad thing. DCDuring TALK 01:54, 7 December 2008 (UTC)
The page title is the text that appears in most browsers' window title bar, and the history menu, and in web search results—the stuff in the HTML <title> element. It is not on the page at all. I'm talking about improving the information about each page which the reader sees in every context except for the page's content. Michael Z. 2008-12-07 06:25 z

Formatting captions of pictures

Out of habit, gained after I have seen some models, I have been putting a period at the end of captions of pictures, such as "Navy pea coat." instead of "Navy pea coat". But I have also seen captions at Wiktionary formatted without a period. Is there any shared convention on Wiktionary for putting or not putting periods there? Any recommendations for me? Thanks. --Dan Polansky 07:33, 13 October 2008 (UTC)

As far as I know we have no specific convention, as I have seen images both with and without a period for my entire time here. However, someone more familiar with policy may say differently than I do. --Neskaya kanetsv 17:47, 13 October 2008 (UTC)
I don't think there's a standard convention for it, but seeing a caption end with a period when it's a simple declaration drives me up the wall. :) EVula // talk // 18:54, 13 October 2008 (UTC)
Periods (full stops) should be used at the end of sentences (and nowhere else). SemperBlotto 18:58, 13 October 2008 (UTC)
It's not that simple. Periods are legitimately used a number of places besides at the ends of sentences (e.g. after abbreviations or numbers in a list). In many dictionaries (e.g, the OED, the Random House, the 1928 and 1913 Webster's, American Heritage), there is indeed a period at the conclusion of each sense's definition, even if that defn is not a grammatically complete sentence.
The MLA Style Sheet says that periods should be used between the elements of references in academic writing, and gives lots of examples.
As for captions, my Chicago Manual of Style (14th ed.) says that periods are not used in legends (caption headings) or caption text consisting of partial sentences, unless the legend immediately precedes the caption text, and gives this e.g.:
Fig. 21. Augustus addressing his troops. This portrayal of the emperor deliberately harks back to old times.
I tend to follow the Chicago Manual of Style and do not use periods in simple captions which are not full sentences, but I do use periods if there is more than one element in the caption (even if the elements are not full sentences). -- WikiPedant 20:12, 13 October 2008 (UTC)
For what it's worth, I have been doing a similar thing with etymologies. When there is a single element (which is almost never a complete sentence, e.g. "From French foo"), I leave it without a period. However, when there are multiple elements (e.g. From French foo. Compare German fooer, Norwegian fö.) I end each statement with a period, for the sake of clarity. -Atelaes λάλει ἐμοί 22:42, 13 October 2008 (UTC)

Wow, thanks a lot, especially to WikiPedant. --Dan Polansky 07:03, 14 October 2008 (UTC)

A good rule of thumb is not to add redundant punctuation to any element whose nature is exposed by design or typography. For example, bolded headings shouldn't be followed by colons, bulleted list items shouldn't end with commas, semicolons, or periods, and captions tied to images don't need terminal periods, especially when they are both bounded by a box. Another clue is that these things are often sentence fragments standing alone.
If such an element gets more complex, then it requires some written structure from commas, semicolons, colons or full stops, maybe just separators, or maybe a terminator too. This includes bibliographic references and complex list items. The Augustus example of a sentence fragment followed by a sentence is just fine by me.
Of course, dictionary entries have an even more complex structure defined by typography and punctuation, but we're mostly doing okay so far. Michael Z. 2008-10-17 20:42 z

One Million

As you may know, the statistics counter is significantly off. We have passed one million entries. I've done a bit of analysis while doing XML dumps. Count is all namespace zero, not redirects, and not "misspelling of" or "only in".

It is said that if you ask a question of 10 economists, you will get 10 different answers; if one is from Harvard, you will get eleven different answers. In that spirit, and in the tradition of judge's decisions in Wiktionary contests being self-appointed, arbitrary, capricious, and final:

Bot entry:

allanaste, Spanish verb form, by BuchmeierBot

Excepting bots:

, Symbol, by Bequw

Excepting bots and symbols and the like:

SFP, Acronym/Noun, by anonymous IP

First actual word:

svetlost‎, Serbian noun, by Dijan

At 7:57 UTC 16 October. Robert Ullmann 15:49, 17 October 2008 (UTC)

This is so cool. Now, where's the press release? :-P Teh Rote 22:03, 17 October 2008 (UTC)
But does this count include "bad" entries, i.e. those without wikilinks? Such pages are usually excluded from the statistics. --EncycloPetey 00:42, 18 October 2008 (UTC)
The script that was used to update the stats after the "pause" due to DB problems counts all outgoing links, not just brackets in entries, while the ongoing increments are based on brackets. So the stat counter will double-count an entry when a link is added, while missing others. It basically crap, doesn't mean anything. Yes, people pay attention to it, not having anything else ... Robert Ullmann 02:30, 18 October 2008 (UTC)
And I'd like to say right off the bat that if, for whatever reason, it turns out that Wikimedia officially declares fr to have gotten to a million first, it would look pretty petty if we made a big stink about it. -Atelaes λάλει ἐμοί 00:44, 18 October 2008 (UTC)
If we think the count of one million is correct, then we should post at Wikimedia News. I haven't seen the data, so I'm following the (probably flawed) statistics counter for now. --EncycloPetey 00:50, 18 October 2008 (UTC)
We can blow that away quickly too, but as noted, it means jack shit. Robert Ullmann 02:30, 18 October 2008 (UTC)
So what kind of data do you want? It takes running through the entire db, counting the entries. Take the Oct 17th XML dump and do it yourself? Or what? (I'm not being snarky, just wondering what you are looking for?) Robert Ullmann 02:33, 18 October 2008 (UTC)
Having never worked with a dump (and not knowing how), I'm not really sure. I'd prefer that you post our milestone at Wikimedia News yourself, since you're the one that did the analysis and therefore would be able to confidently answer critics who point out that our site-based page count doesn't say one million yet. Maybe this will prompt someone to fix that... Supposedly XML dumps are now possible and "on the way", at least according to a conversation I had a couple of days ago on #wikimedia-tech. --EncycloPetey 03:17, 18 October 2008 (UTC)
So here's a question: How have these sorts of things been done in the past? Did someone simply watch RC and declare the winning entry to be the one which turned the RC counter to whatever number? Certainly there must be a more systematic way of doing it than that. How did they decide the ten millionth all-lang wikipedia article? It seems to me that we should follow the system that every other project has been using, whatever that is. On an almost related note, some projects seem to be noting everything (e.g. number of articles, number of edits, number of users) on that news page. Maybe we should note our millionth block, when the time comes (if we haven't done it already). That's something I would be proud of. -Atelaes λάλει ἐμοί 06:48, 18 October 2008 (UTC)
When I've done it in the past, I've noticed the counter just over the mark, and counted backwards. Now that we have XML dumps current (EP: look at even if you don't want to grab any; we have dailies) it is easier to simply count from the data. We probably should wait until that counter {NUMBEROFARTICLES} goes over 1 mil before any announcement; should only be a few hours anyway. (Oh, BTW, the WM XML dumps are running, but Brion has failed to fix the queueing problem; there are 5 threads running, all stuck on huge projects; the en.wp dump has an ETA in February ;-) Robert Ullmann 12:10, 18 October 2008 (UTC)

one million on the counter at 15:57 UTC, entry was good job ... fr.wikt had 995,002 at that moment. Robert Ullmann 15:59, 18 October 2008 (UTC)

Darn it, I already posted it to Wikinews before you mentioned that! I'll have to correct it. Teh Rote 18:36, 18 October 2008 (UTC)
Remember that that counter is bogus. Counting all the entries in NS:0 that we want to count, the millionth entr(ies) are as I listed above. But the counter was reset recently using actual outgoing links (which is closer to the way we'd like to count). Counting with the brackets algorithm that the counter uses for updates, we are at 995,606 right now .... Robert Ullmann 16:23, 19 October 2008 (UTC)

Kudos to everyone involved! Really nice achievement. I think it is time to remove sort of strange claim that Wiktionary have around 300,000 articles. "Excluding these 163,000 entries, the English Wiktionary would have about 137,000 entries" (c) Wikipedia TestPilottalk to me! 12:18, 20 October 2008 (UTC)

one million if the counter was correct (;-) If the counter was doing what it should, and not off by ~7000 ... counting entries with [[ in them, the million entry is

  • (including bot): insertad
  • (excluding bot): stuntwoman created as gibberish by an IP-anon, and turned into a proper entry by Nandando

at about 23:52 UTC 20 October. So there are two more candidates ... (smirk) Robert Ullmann 12:12, 21 October 2008 (UTC)

Didn't you even have a banner?

When Wikipedia surpassed 1,000,000 articles a long time ago, they had a banner on the front page commemorating it. Why doesn't Wiktionary? --Takamatsu 04:29, 28 October 2008 (UTC)

I think that when half of the entries are soft redirects written by bots it sort of makes the entry count a bit less significant. It would probably be a short order to write an inflection bot to write thirty thousand grc inflected forms. While Wiktionary would be a better dictionary because of it, the numbers somewhat inflate the importance. -Atelaes λάλει ἐμοί 07:29, 28 October 2008 (UTC)
Our entries are also quite a bit smaller in the average case (whether form-of or not); if one imported M-W (not, because it is copyright, but if one did), that would be 736,000 entries at a go. (we have about 1/2 now, as a WAG) "million" just doesn't mean as much. Now, when we get to (say) 1M headwords (counting entries for each language, not pages, but not counting form-of unless they have defs and examples) then we'd have something to crow about. As of March, that metric was about 437K, depending on details. Robert Ullmann 08:03, 28 October 2008 (UTC)

Standing in for User:TheCheatBot

I wrote a script to extract the missing plurals of English nouns from the XML dump. There are about 3,500 of these at the moment. I ran a totally supervised bot (I had to manually approve each entry before submission) on a hundred and thirty-odd of these in the pre-million dash and the accuracy seems to pretty high. I ended up deleting three singular pages that should never have existed, and made two further pages uncountable. I don't have the time to do the google books search for so many words (besides which, they blocked me from searching when running manually for acting like a bot). Would there be an interest in me running this as a proper bot? This may mean that a few entries get created which should not exist, however I would still check the output for any obvious errors. Conrad.Irwin 11:54, 19 October 2008 (UTC)

Yikes! I undercounted just slightly. With a much-improved parser I now find 25,000 missing plural entries. Conrad.Irwin 13:16, 19 October 2008 (UTC)
Hmm ... I was going to suggest we look at the list of 3500. You are finding 25K for English? or a number of languages? A list of English words should be easy for a fluent native speaker to review in advance, likewise for other languages. Robert Ullmann 15:42, 19 October 2008 (UTC)
24K now, with the User:Conrad.Irwin/bad_plurals gone - leaving User:Conrad.Irwin/good_plurals. You'd need a larger vocab than I to make much of a dent in it; but it would certainly be possible to do a quick sanity check. Conrad.Irwin 23:22, 19 October 2008 (UTC)
Doesn't look terribly hard But where are you getting, say, "albatrosss" from? Robert Ullmann 23:36, 19 October 2008 (UTC)
From neglecting to remember that the pl= parameter cancels out the other parameters (as the pl2= and pl3= don't). Interesting that there are still a number of words ending in 'sss', so maybe I just got lucky with my first sample. Down to 23,000 now. Conrad.Irwin 01:00, 20 October 2008 (UTC)

Along similar lines: I was running some code to add links to various "form of" entries, from the list at User:Robert Ullmann/Not counted/Fof list. This was to increase the pages counted by the s/w. I stopped when we went past 1M, as I had caught a couple of errors, and need to look at the code a bit closer. Should I continue? Robert Ullmann 15:42, 19 October 2008 (UTC)

It might be more beneficial to convert the entries not using the form of templates to use them - or maybe to do both in parallel; they're both low priority. I can get you a list of 12,000 entries that look like form-ofs but don't have templates if you want - but deciding which "form of" template to use will require a bit of parsing fun. Conrad.Irwin 23:22, 19 October 2008 (UTC)
I added two lines to AF to link in some of the form-of cases; the edit summary says "make page count: ...". It only does this if the page doesn't already count. Not terribly useful, but is our general practice. Where can I get your list (which sounds more interesting ;-)? Robert Ullmann 08:07, 28 October 2008 (UTC)
If you're still following this thread it's wikt_cleanup. Conrad.Irwin 01:56, 16 November 2008 (UTC)

Unattestable plural forms of countable nouns

What should I do with plurals of words like replayability, where the plural clearly can exist (when comparing the replayabilities of several games) but doesn't have any use. Similar problem with exothelium and exothelia - but as the singular has such low use I'd be more inclined to include the plural "on faith". (I'm running in manual mode in my bot account for a short time now and then) Conrad.Irwin 01:13, 24 October 2008 (UTC)

My personal preference would be for a "plural not attested" option for en-noun (and analogous templates). Or maybe "plural form Xs not attested," generated by {{en-noun|spec=Xs}}, so that we could show the presumptive plural without linking it. -- Visviva 03:22, 24 October 2008 (UTC)
replayabilities has (very limited) usage on the web in the way Conrad suggests, but not in instances that we accept as durably archived. It seems to me that we only have have one relatively permanent and definite state for plurals: attested. The attestation can be challenged, but let's ignore that. Let's also ignore multiple plurals. Very few plurals have been attested and the effort to do so should probably await the development of the "attestor's workbench". At the lemma (singular) entry, we now have five states shown: "uncountable", "blue plural", "red plural", "black plural", "no plural" (omission). Unfortunately none of these at present has any procedure that assures the user of the attestation or other kind of validation of the plural.
I see merit in blue showing only for attested plurals. "uncountable" is sometimes used because a contributor is unsure how to insert the proper spelling of the plural, even in simple caaes ("es"). Perhaps en-noun should default to show no plural, with it entering a maintenance category ("no plural shown"). Once a plural is shown by a selection of a particular form, perhaps it should be shown as black until an attested plural entry is entered. Uncountability has its own special problems, but its use when a user is ignorant of template mechanics is not acceptable. If a plural fails RfV (or a similar attestation process), but the plural would follow common rules, a blue-linked asterisk or other superscript would seem good enough.
The obvious biggest problems are the work and the change in contributor habits. En-noun would change. Many existing plural entries, some with non-vacuous content, would be orphaned. There must be other problems, too, but I will leave their discovery to others. DCDuring TALK 09:13, 24 October 2008 (UTC)
I don't think that would be particularly constructive. For the vast majority of English nouns (and other regularly-inflected words in living languages), there is no problem with our usual practice; the plural exists, or occasionally has good reason not to exist, and for most words that's all there is to say. We don't want to complicate garden-variety entries because of a handful of corner cases. On the other hand, I think we do want to do a somewhat better job of handling these (especially because it will give us a better handle on classical and poorly-recorded languages, where these cases are much more common). -- Visviva 12:36, 24 October 2008 (UTC)
is an uncountable noun. In the first 30 hits at google books:replayability, I find plenty of unambiguously uncountable uses, and no unambiguously countable ones. Granted, all uncountable nouns can be countified, but if the plural isn't even attested, then I don't see what's wrong with {{en-noun|-}}. It drives me crazy when editors use {{en-noun|-}} to mean “I don't know what the plural is or whether it exists”, under the (IMHO wrong) impression that an erroneous "this noun is uncountable" is better than an erroneous "this noun's plural is _____"; but I think it's perfectly fine to use {{en-noun|-}} to mean “this noun is uncountable, but English grammar allows uncountable nouns to be countified”. —RuakhTALK 13:25, 24 October 2008 (UTC)
See Appendix:Unverified plurals. Teh Rote 04:04, 16 November 2008 (UTC)

Meaning of numbers in statistics

Can someone please explain the meaning of the table headers in Wiktionary:Statistics? 1. Number of entries 2. Number of definitions 3. Gloss definitions 4. Form-of definitions. How come the total Number of entries (1,003,409) and Number of total pages (1,117,298) are different? Thanks. --Panda10 00:56, 20 October 2008 (UTC)

Pages is a literal count of all pages (including talk pages, wiktionary pages, etc.etc.) entries (in this context) is the number of pages in the main namespace that contain "[[" (the fun of software). Conrad.Irwin 01:02, 20 October 2008 (UTC)
Except that "entries" is not number of pages (not including redirects) that contain [[. It is the number of pages with actual outgoing links as of about two weeks ago plus the number of pages since added or updated to include [[. That is, it means practically nothing. The number of pages containing [[, what it is supposed to be, was 996,337 as of a few hours ago. Is the MW software broken? Yes, seriously. Do they care? No. Robert Ullmann 01:16, 20 October 2008 (UTC)
What is number of definitions then? There are approximately 1,485,304 definitions in total. How could it be that literal count of all pages smaller than number of definitions? TestPilottalk to me! 12:11, 20 October 2008 (UTC)
"definitions" is not "number of words that have definitions", it is the number of definitions. Some words have more than one, and some pages have more than one word (homographs, in 1 or more languages). Robert Ullmann 15:11, 20 October 2008 (UTC)
More specifically it is the number of lines in the redirect-free main-namespace that start with a "#" that don't continue with a "#*:;" and don't contain {{rfdef}} or {{defn}}. Conrad.Irwin 15:20, 20 October 2008 (UTC)

Let’s put {{t}} in WT:ELE

I just noticed that {{t}} is not mentioned in WT:ELE, but it is ni Wiktionary:Translations. I think the template is sufficiently mature now to include it there. Shall I start a vote on this? Something like: include in the dos: use {{t}} and adapt the example below accordingly? H. (talk) 09:29, 20 October 2008 (UTC)

Since nobody objected or commented on this, I have been bold. H. (talk) 11:45, 6 November 2008 (UTC)
That's not how it works. We want explicit discussion before making major changes to policy documents. Further, you changed all the quotation marks to smart quotes, which do not display correctly in some browsers or on some platforms. I have reverted your changes, since they were not approved. You should know better. --EncycloPetey 18:31, 6 November 2008 (UTC)
Which browsers? Which platforms? They are used on fr.wiktionary, and nobody has ever complained. They are normal characters. In cases such as mother's, I think that books never use the character proposed by computer keyboards. Lmaltier 22:07, 4 January 2009 (UTC)
Windows on PC (older but still widely used versions in schools and universities). Your guess about books would be wrong; it varies by publisher just as the presence of serifs in the font does. And, as we often point out, Wiktionary is not paper. What is done in printed books is in no way binding here, particularly when it is a cosmetic issue for print publishers but a technical issue for us. --EncycloPetey 22:28, 4 January 2009 (UTC)
Re: smart-quotes: really? I dislike them myself, but had been given to understand that they're de facto policy, so have started using them myself (well, when I bother). They're enforced by templates such as {{term}}, and encouraged by their presence in the edit-tools. But if we can get a push to use normal-person quotes, I'm totally behind it. :-)   —RuakhTALK 20:07, 6 November 2008 (UTC)
Please don't use them. It makes it harder to use wikt. I lost my place on the page trying to find this section, but when I typed ctrl+F "let's" into the search box, it didn't find it - so I had to scroll all the way back up and click on the link. What is the reason for preferring these ugly, less-widely-supported characters over the nice normal ones? Conrad.Irwin 12:57, 13 December 2008 (UTC)
I support putting the {{t}} into WT:ELE as the recommended method, and as far as I am concerned H. was right to be bold after more than two weeks. I agree with EP though that "smart" quotes shouldn't be used. Thryduulf 19:51, 6 November 2008 (UTC)
Don't misundertand. I agree that {{t}} oought to be explained in ELE, but we've held for a longer time now that major changes to the ELE text require a vote. This would be a major change, and it would certainly be preferrable to hash out any discussion of problems with the particular wording before altering our primary policy document. --EncycloPetey 20:09, 6 November 2008 (UTC)
I do not remember that a vote is needed for each change to WT:ELE. I do agree that it should be discussed, and that was what I did here. Since there were no objections, I acted. Note that I explicitly asked whether a vote was needed, you could have reacted on that earlier. Feel free to start a vote saying that every change to WT:ELE should require a vote and I’ll be happy to oppose. Anyway, I started a vote about this now: Wiktionary:Votes/pl-2008-12/t template in WT:ELE. If there are no objections to the wording, I’ll remove the premature tag in a few days.
Also note that it was not a coincidence that I made two edits to the page: one for the quotes, one for the template. I can see why you reverted the template change, but I do not agree with the quote symbols. They are common practice here (as opposed to wikipedia, I’m afraid), so I will reinstall that. H. (talk) 12:26, 13 December 2008 (UTC)
Read the top of that page. It should not be modified without a VOTE, the last few people who have been tried have been blocked temporarily for doing so. The qoutes are not an uncontested change, they should be discussed - and abandoned in favour of common sense (imo). The {{t}} template is probably uncontested, but it is a big change and so should be voted on as explicitly said on that page. You should also note that silence doesn't mean "yes". Conrad.Irwin 12:57, 13 December 2008 (UTC)
Ok, I apologize for not reading that header carefully. Re the quotes: djeezes, making trouble about an esthetic improvement.
Silence indeed doesn’t always mean yes (contrary to the (Dutch only?) proverb zwijgen is toestemmen), I even have a little book about pragmatics with the title “Zwijgen is niet altijd toestemmen” (being silent is not always agreeing, badly translated). Ah well, I guess here comes a totally useless vote: Wiktionary:Votes/pl-2008-12/curly quotes in WT:ELE. The wording is probably still a little too polemic, feel free to disarm it, or tell me. In a few days’ time I’ll probably be cooled down enough to rephrase it. H. (talk) 16:05, 13 December 2008 (UTC)
Regardless of whether I agree with the specific changes (my views are, I think, adequately expressed at the two votes), I cannot fault Hamaryns for being bold (perhaps even recklessly bold) in pursuing an issue which he thought important. His edit to ELE seems to be the only thing which is causing these issues to be sorted out. -Atelaes λάλει ἐμοί 22:39, 4 January 2009 (UTC)
Since there were no more comments, I started the votes: WT:VOTE. H. (talk) 14:40, 4 January 2009 (UTC)

Inconspicuous links to other projects

Is there a template or a template parameter to let an entry link to other Wikimedia project only by inserting an inconspicuous link at the left, without creating any other visual element? What I mean is such a link that can be seen at commons:dog, where it links to Wikipedia.

I would like to use such a template for linking to Commons. The conspicuous links to Wikipedia created by {{wikipedia}} and {{pedialite}} are already common and got used to, so I would keep the practice of using {{wikipedia}} for Wikipedia. --Dan Polansky 19:05, 20 October 2008 (UTC)

I don't think there is one, and personally, I don't think I'd like there to be. What's wrong with {{projectlink|commons}}? —RuakhTALK 21:01, 20 October 2008 (UTC)
The template {{projectlink|commons}} is okay, but it takes away four lines, uses icons, and uses boldface, so the result it produces is quite conspicuous, IMHO anyway. By contrast, the links that the very same template creates at the left are inconspicous, and easy to find once the user gets used to looking there for them. (An example of use of the template: United States.) The template is placed under "External links" heading, which it only should, and yet, the links are external to Wiktionary while internal to the group of Mediawiki projects, unlike links to, say, Webster's dictionary.
Also, what I find strange is that some of the items created by the template are sentences, not terms denoting external resources. What I mean is that the link reads "Wikimedia Commons has media related to “United States”." and not "Media related to “United States” at Wikimedia Commons". For comparison, we have at knowledge: ""knowledge" at The Century Dictionary, The Century Co., New York, 1911.". But that is a different topic. --Dan Polansky 07:03, 21 October 2008 (UTC)

Site ranking

According to Alexa we ( are now in the top 1000 sites. en.wikt is almost half the overall traffic for the wikts.

(note that the rank displayed at the top is a 3 month average; look at the graph and the daily numbers ;-)

About 1 in 1000 of all global Internet users use the Wiktionaries in the course of a given day. Not bad. Robert Ullmann 17:38, 23 October 2008 (UTC)

42% of the traffic is to enwikt. gets more than four times as much traffic as enwikt. 27% of users come from US, UK, canada, and Australia. More than 75% of users come from those 4 countries. English-only has more traffic from every English-speaking country that I can see stats for than all DCDuring TALK 22:35, 23 October 2008 (UTC)
Yeah, one of the huge advantages of en.Wiktionary over M-W is that it got translations plus it can define words for variety of languages. Plus, if person speak more then one language, Wiktionary sometimes can provide word definitions in other languages. TestPilottalk to me! 19:40, 25 October 2008 (UTC)
We seem to have backed into a de facto market strategy of serving non-native speakers. The other side of offering the translations and prominent pronunciation section is to make the site less attractive for most native speakers. This would be particularly true for the very large number of mono-lingual English speakers in the US. Such users would find MW good for "serious" use and Urban Dictionary for trendy slang. No wonder that I have to explain Wiktionary to my friends. DCDuring TALK 20:15, 25 October 2008 (UTC)
Native speakers outside the United States (which is the majority), find translations and such very useful, as they are often using several languages. UK and Aussie/NZ might be more mono-lingual, but native English speakers in India will pretty much always also speak Hindi or whatever; English speakers here all speak Swahili as well (the evening news switches back and forth whenever someone is interviewed in the other language, with no captions or pause :-). I like the fact that we are much "bigger" than M-W and such. Robert Ullmann 21:49, 25 October 2008 (UTC)
You are looking in very wrong direction. The reason why en.Wikt is not as popular as M-W is because it is not as good. Yet. And it is really easy to see that M-W got more definitions and even headwords(for English). But Wiktionary catching up. And it is a matter of time till it become best dictionary around. Just as it happened with Wikipedia vs. Britannica. But multilingual capabilities do bring more users, native speakers including. Without them, there wouldn't be even half of those 27% around, not to mention the rest. TestPilottalk to me! 09:29, 26 October 2008 (UTC)
We are certainly not as consistent as M-W, but why are we relatively more successful among those located outside of English-speaking countries, especially the US? Is it just because of interwiki links?

I doubt that it is merely completeness that sets us back among native English speakers. Is it our layout (prominence of pronunciation, incredibly long tables of contents forcing users to page/scroll down before finding English definitions}? (Note what OneLook does, providing a list of definitions down the right-hand side in addition to the list of entries from different sources.) Is it the obsolete wording of some of the old Webster entries? Is it lack of consistency in presentation? Is it unreliability of entries? DCDuring TALK 14:26, 28 October 2008 (UTC)

Lets get statistics straight. M-W are getting more then 71% of it traffic from North America. Primarily from USA. With Great Britain plus Australia(total just above 4%) being insignificant markets. How come Merriam Webster are so desperately targeting States? My guess, since they oriented toward making money, they go for where is most moneys is - first economy in the world. Now lets look at enWikt. It have 42% share of total of Wiktionary traffic. 27% of total Wiktionary traffic comes from US+UK+CA+Au. That mean enWikt are getting 61+% from those 4 English language countries(if we assume that majority, 99+% who comes from US/UK/etc. go to enWikt). 14% difference compare to Now lets look at North American share of Wiktionary. Only ~47% for US or ~53% US+CA! Far cry from 71% of M-W. Now check English language population of US and CA in comparison to English language of the rest of the world. You can clearly see that Wiktionary traffic distributed more evenly. Should enWikt target primarily Americans? It might make sense donationwise. But I rather say NO. It is doing fine as it is so far.
Reasons for not being on top is not "merely completeness" of Wiktionary. It is overall quality of definitions. Plus lack of trust, huge brand recognition of Merriam-Webster trademark(100+ years old), lack of examples/pronunciations/etymologies sections for lot of Wikt entries and so forth.
As for "long tables of contents" - it could be fixed in really easy way. Lets make it collapsed by default for anonymous user and remember state for registered users. As simple as that. I doubt thou it is a real problem. Prominence of pronunciation? You must be kidding. There is dozens projects around that target minimalism approach toward user interface of dictionaries. The best web based one, IMHO, I personally doubt that it got many users. Writing something like NinjaDic with Wiktionary backend could be done in no time, and most likely is out there somewhere. I myself, by coincidence, happens to have one such project. Download WikiLook, and it will define words for you using enWikt without pronunciations, translations, tables of contents and so forth. WikiLook even go one step further - it let you to check definitions without opening any extra tabs or new windows in your web browser. With an easy access(just a click) to full definition page in case you need it. And that is just couple of gazillions possible ways of approaching end user (keep in mind, they got different needs by definition). With open source communities catching up, there would be even more options around. TestPilottalk to me! 20:05, 28 October 2008 (UTC)
Here are the simple facts of wikt market share relative to mw:
US: 20%; UK: 82%; canada; 46%; India: 46%.
In the UK, is a popular site (more popular than wiktionary) that gets 82% of its reach from its dictionaries.
MW has a high portion of its traffic (79%) from these 4 countries because it only offers English. Some of's traffic in these countries is attributable to offerings from de.wikt, fr.wikt, etc.
In English wiktionary faces competition from,,,, wordnet,, artfl, and others.
Our multi-lingual offering ("all words in all languages"), together with whatever traffic the other wiktionaries get in English-speaking countries, don't seem to help us gain share against the monolingual dictionaries in any English-speaking country. I don't know what the implications of this are for us, but it might make it easier to understand why we don't seem to get much attention from WMF developers or from the US media and the public. DCDuring TALK 01:18, 29 October 2008 (UTC)
I agree, and I think it is looking very positive. These are huge figures, and we're not anywhere close to a “finished” dictionary yet.
By the way, do we have any way of gauging the completeness of basic vocabulary in Wiktionary? I wonder how we compare to, for example, a grade-school dictionary or small college dictionary. Michael Z. 2008-10-28 02:41 z
List of defined entries from big editions. You can see how many are redlinks here. On the other hand I would not call most of them "basic". From time to time I come across real world words that not defined here and are defined in Google or And, for comparison - in the year 2004 there was 50K entries here total. And you can hardly score hit, unless you are searching for truly basic words. TestPilottalk to me! 12:59, 28 October 2008 (UTC)
I'd like to see lists from some small and medium editions, although I suppose those might not have been deemed as valuable to compile. They could give us some milestones to shoot for on the way to the OED. Michael Z. 2008-10-28 16:16 z
I guess large word lists are a good start: simple:category:Wordlists, WT:FREQMichael Z. 2008-10-28 17:26 z

Proverb Categories

There are two categories for English proverbs: Category:Proverbs and Category:English proverbs. If {infl|en|proverb} is added to the inflection line, the entry will be automatically in Category:English proverbs. Can I move all entries from Category:Proverbs to Category:English proverbs? --Panda10 22:15, 24 October 2008 (UTC)

Why not the other way? DCDuring TALK 20:43, 25 October 2008 (UTC)
It would make no difference to me, but there a couple of reasons: Proverb is a POS, so it is a grammatical category just like nouns and verbs. Also, the category for foreign languages is "<lang> Proverbs", not "xx:Proverbs" (see Category:Hungarian proverbs vs. Category:hu:Proverbs). It would be helpful to keep it consistent. Plus using {infl} automatically puts the entry in English proverbs. --Panda10 20:52, 25 October 2008 (UTC)
Yes, it is like other POS. Should be Category:English proverbs and in Category:Proverbs by language. Which it is. Robert Ullmann 21:42, 25 October 2008 (UTC)
Agreed. Panda10, please do. —RuakhTALK 05:37, 26 October 2008 (UTC)
Actually, I don't think "Proverb" is really a part of speech in the grammar of the English language at all. I prefer just to use "phrase" as the POS and add the category Category:English proverbs manually for the English phrases which clearly have proverbial status (which can sometimes be a bit of a judgment call). -- WikiPedant 18:54, 26 October 2008 (UTC)
"Phrase" isn't a "true" or "classical" PoS either. Both "Phrase" and "Proverb" are in use as Wiktionary PoS headers. Phrase seems to have higher status, but Proverb seems superior to me in specificity. Most occurrences of "Phrase" could be replaced with a "classical" PoS, but not any properly applied Proverb headers. However, it is probably more important to have an entry in the Proverb category than under a Proverb PoS header. DCDuring TALK 20:17, 26 October 2008 (UTC)
I agree that "proverb" isn't really a grammatical POS, but to me "phrase" implies "not a clause". I wouldn't consider "It takes all kinds to make a world" to be a "phrase", for example. Even something like "Don't count your chickens before they're hatched", which technically is a single verb phrase with at its head, isn't really well described by the "phrase" POS IMHO. —RuakhTALK 22:53, 26 October 2008 (UTC)
Personally I try to avoid using "phrase" as POS, and would advocate deprecating it. I can't even remember the last time I used it. I like to use "proverb" if it really is one, but the category seems to have a number of entries that are not in the form of the set proverb e.g. one who hesitates is lost (anyone searching would start this with "He") and some simply are not proverbs, full stop. Back to the main point, I believe that Category:English proverbs is the correct heading for the reasons outlined above, and also because many of them have direct translations to similarly worded proverbs in other languages. --ALGRIF talk 10:16, 28 October 2008 (UTC)
"Phrase" has quite a range of meanings. In its broadest sense, "phrase" means "a particular choice or combination of words used to express an idea, sentiment, etc., in an effective manner" (OED). Sometimes, "phrase" is understood in a more limited way as any syntactic unit which does not contain both a subject and a verb, ruling out clauses and complete sentences. And there are, of course, a number of other valid senses of "phrase" (some of them quite technical). I still see "phrase" (understood in the broadest sense) as the best generic term we have for any group of words (including complete sentences) which cannot be classified under any other part of speech. Even if we allow the use of "Proverb" as a POS (more like a pseudo-POS, I'd say), there are still some entries which defy all other POS's except "Phrase" -- such as that's the way the cookie crumbles, so far so good, that's the way the ball bounces, or ladies first, all of which are currently categorized as proverbs, but, as Algrif says, "simply are not proverbs, full stop." But, to return to my original point, I'm still wary of using "Proverb" as a (pseudo-)POS and prefer to use "Phrase" with the manual addition of Category:English proverbs for that subset of phrases which are incontrovertibly proverbial. -- WikiPedant 18:33, 28 October 2008 (UTC)

The move is completed, thanks Dan. I'd like to delete the now empty Proverbs category and its talk page. I checked the What links here page, and modified a couple of pages. The rest of the links are talk pages. --Panda10 17:56, 28 October 2008 (UTC)

Featured Words

Wiktionary's Word of the day currently just points out "interesting" words, for expanding one's vocabulary; we have no equivalent to Wikipedia's Featured articles, which are not necessarily the most interesting articles, but which are comprehensive, well-written and referenced. Would other editors support the creation of a Featured Words project to do something similar for Wiktionary? --Ptcamn 21:35, 25 October 2008 (UTC)

I seem to remember that when Word of the day started, we said that all words chosen were to be of a reasonably good quality, with an etymology and at least some translations. Or is my memory even worse than I thought (quite likely)? SemperBlotto 21:59, 25 October 2008 (UTC)
I don't know how it started, but right now, it seems to be the other way 'round: whoever's doing WOTD (IIRC: EncycloPetey until recently, and currently Circeus) will select an entry because the word is "interesting", and then bring it to a good quality before the date arrives. That's my impression, anyway. —RuakhTALK 05:41, 26 October 2008 (UTC)
Even the earliest entries, for the months before I got involved, were selected as "interesting" words rather than as quality articles. Part of my rationale for keeping WOTD that way is that there is a tradition (at least in English-speaking countires) that a "word of the day" should be such an interesting and vocabulary-building word. Calendars, newspapers, and even other on-line dictionary sites do it that way because that's what a reader expects it to be. This is a draw for people who enjoy learning. In contrast, the Wikipedia Featured article can feature their best articles, rather than just something offbeat or unusual, because they are an encyclopedia web site, and so have articles on topics rather than entries on words. No matter how good our entry on the name Cyrus became, it would never draw in the number of views that WP could get featuring an article on Miley Cyrus. A topic can have currency, breadth, and immediacy (even on a historical topic) that just isn't easy to put into a dictionary entry, nor should we try to change that. Second, our entries are, by necessity, much heavier on adherence to format than WP articles, and so what makes our entries "featurable" is fully expanded format. That's something that's just not as interesting to the general public. Yes, it's worth having our entries on , , and as rich and well-done as we can, but a well-done article on such a word, even if rich with useful content, carries no general appeal. Third, a Wikpedia article, because it covers a topic, can begin with a summary section, and it is from this summary section that their main page copy portion is produced. What would we place on the Main Page to draw in reader interest? We can't summarize the content of one of our entries, because it's partitioned into discrete sections, each one of which intended to contain and present a specific kind of information. You can't summarize the Quotations, Synonyms, Derived terms, Translations, etc. the way that Wikipedia is able to summarize the subsections of one of their articles, so there just wouldn't be a way to present such an article on the Main Page. The underlying functions of a dictionary and an encyclopedia are very, very different. The way in which WP's featured articles differ from our WOTD is just one reflection of those diferences. --EncycloPetey 18:50, 6 November 2008 (UTC)
I'm not convinced that the time wasted to work out what makes an entry featurable and then to debate whether pages conform to the criteria would be best spent thusly. I do think that to have a list of a few well constructed entries with varying levels of detail would be useful to point newbies at, and that's pretty much the same thing. Conrad.Irwin 08:41, 26 October 2008 (UTC)
I tend to agree with Conrad.Irwin. Wiktionary editing tends to be rather more distributed than Wikipedia (when was the last time you saw anyone spend a week on a single entry........and thought it time well spent). Additionally, it seems like few, if any, people are good at everything. Some folks write amazing defs, some verify esoteric words, some write code, some put out fires and generally make up for the lack of diplomacy most of us have, etc. I have yet to see a single editor whom I would think capable of writing that perfect entry all by themselves. Couple that with our general inability to coordinate and stick to tasks (I run out of fingers and toes counting off failed projects, and simply run out of ideas trying to come up with one which involves more than one person and is completed in a timely fashion), and I think it unlikely that we're capable of creating a new such entry every day. Speaking for myself, I'm far too self absorbed to get bogged down by deadlines and such. I think that having a few entries brought up to amazing standards as examples for.....well.....everyone (not just newbs) would be an excellent idea (and it might not be a bad idea to make a habit of getting one for every language) and time well spent. Just a point of clarification, I don't think that our project sucks, no matter how much my preceding comments might seem to indicate. :P -Atelaes λάλει ἐμοί 09:04, 26 October 2008 (UTC)
Very true, and very well put. —RuakhTALK 17:33, 26 October 2008 (UTC)
Likewise. That was one of the goals behind two of my personal pet projects:
  1. The Model Pages project, which is more modest right now than when I first conceived it, but which I maintain and for which I continue to seek help now and then. The result is that we have really good models for simple situations on a common noun (), proper noun (), and verb (), as well as one slightly more complicated case (). I've also made a start on a few non-lemmata to show a little of how to take these pages beyond the bare minimum (which is something that even an inexperienced new user can contribute greatly to).
  2. The Substantive nouns primer for Latin. I took a variety of Latin nouns chosen as if I had been looking to make an ABC-book for children, so I selected everyday nouns (well, everyday for Ceasar's Rome) which could be illustrated easily. I also made sure to represent various genders and declension patterns so those would be modelled for editors. Each of these has been expanded as much as I've (so-far) been able to manage. I went one step further and also beefed up the corresponding entries on Victionarium at the same time, thus ensuring that people who went looking first on the Latin edition of Wiktionary, but who desired an English explanation, would be able to follow a trail here.
There is ample opportunity for similar projects using additional articles or in additional languages, but my experience is that you should expect to tackle them almost single-handedly. At best, you'll be getting support only when you make a direct personal appeal to an expert or specialist on a scale that they can manage easily (such as asking for a translation into a particular language). Ultimate success will depend on your own personal unflagging enthusiasm and effort. --EncycloPetey 19:09, 6 November 2008 (UTC)
Not disagreeing, but we should somehow encourage folks to tackle revising the big entries, many of which would be an embarrassment as WoTD, not being better than the Webster 1913 entries they started from. What VisViva did for head is something that needs to be done for many entries to serve our core language-learner user base. A single person may need to tackle each, but will certainly need assistance. DCDuring TALK 20:27, 26 October 2008 (UTC)
I completely concur with the proposal for "featured word" - after I encountered some adversities when expanding the etymologies, I was forced to insert references and it has become my habit ever since. It would be better if referencing in the "etymology" section augmented and leads it to "featured" provided that the article is circumstantially and diligently dealt with. Bogorm 11:06, 27 October 2008 (UTC)
It seems to me that you are advocating reactivation of {{COW}} If so, I would agree. I'm not sure why it is deactivated in the first place. -- ALGRIF talk 10:07, 28 October 2008 (UTC)
I think it's because no one was laboring except for EncycloPetey, and "solilaboration of the week" didn't have the same ring to it. —RuakhTALK 19:01, 28 October 2008 (UTC)
The COW died from serial solitary laboring. Connel tried to keep it going for a while, and around the time he burned out on it (for not getting any sustained community involvement), I stepped in. Then I burned out for the same reason, so Davilla stepped in and subsequently was just as disillusioned. The French Wiktionnaire has had the same poor results. In late 2005, they started an Articles de qualité, which has grown to include only 14 articles since that time. While the idea is a good one in principle, in practice it just hasn't worked. Chalk it up as one other way in which Wikipedia and Wiktionary substantially differ. --EncycloPetey 18:38, 6 November 2008 (UTC)

New subcategories for English given names

Category:Male given names and Category:Female given names have over a thousand names each. A few hundred names are listed in the subcategories by origin. It's a confusing set-up. You have to keep clicking pages to find secret subcategories that are almost empty. I would like to sort all English names into subcategories (why have them at all otherwise?) and for that, I would create the following new subcategories:

  • Diminutives of male/female given names. Pet forms are quite separate from formal given names in many languages.
  • Male/female given names from surnames. The place name Shirley derives from Old English, but it wasn't a personal name then, so it's misleading to list it with Edith and Mildred.
  • Male/female given names from place names (when they are not surnames). Brittany, Erin, Shannon etc.
  • Male/female given names from English. April, Heather, Pearl, Earl. Where the word originally comes from is beside the point.
  • Male/female names of artificial origin. Vanessa, Belinda, Jayden, Deshawn etc.
  • Male/female given names used in India (or: from India). They cannot be called romanizations since the original language often isn't mentioned. They may be used in several Indian languages. Hopefully some day somebody comes along who can sort them into sub-subcategories.
  • Romanizations of Russian/Greek/etc male/female given names. See the example Nikita. I think it's an error to call them English. The subcategories should go to the original languages.

Robert Ullmann wants to change the names of all given name categories from topics into parts of speech, so "English" would be prefixed to the above. It would be a good chance to get rid of erratic categories. E.g. Category:Male given names from Greek was originally a mistake for Ancient Greek and is now used for romanizations. --Makaokalani 09:33, 27 October 2008 (UTC)

Sounds like the right direction.
Do endearing names correspond to diminutives in all languages? (They mostly do in Slavic languages.)
I'm not so sure that's the right way to treat the “romanizations”. The way we treat most terms, the Russian entry would have Никита in Cyrillic. Not sure where Nikita in Latin letters belongs—perhaps under several language headings or “translingual”, but it's not in Russian orthography, and it is used by people who don't speak a word of Russian. Michael Z. 2008-10-28 02:34 z
You are right. A romanization is a part of speech ( it IS a romanization, not ABOUT romanizations) so it should have a language and Nikita isn't a Russian romanization. Can parts of speech be translingual? "Category:Translingual romanizations of Russian male given names", and a parent category "Translingual romanizations of given names"? Does anybody except a Wiktionary editor understand what a translingual romanization is? Each time I try to make order to the given name categories the matter gets more complicated.
A diminutive would be any endaring or pejorative name derived from a person's official given name. They are different in each language. In Finnish almost anything beginning with the first letters of the given name will do, and only the commonest ones are worth recording. It's a much needed category. Right now Spanish Pepe and English Jim are defined as given names instead of diminutives. When an English name is both it can be explained in the entry. Betty = "A diminutive of Elizabeth, also used as a formal given name."--Makaokalani 11:13, 29 October 2008 (UTC)
But are romanizations common for all languages using the Latin alphabet? (Or am I now confusing romanizations with translitterations again?) I was just thinking of the Russian name Юрий which, in en:wp, seems to be translitterated/romanized as variously Yuri, Yury and Yuriy, but which in Swedish are spelled Jurij, and according to w:Yuri Gagarin also are spelled as Joeri, Juri, Iuri, Youriy,.... depending on the target language. Hence I don't see how you mean there is a translingual romanization of this particular name. \Mike 11:51, 29 October 2008 (UTC)
I wish there wouldn't be any romanizations in the Wiktionary. I'm just trying to sort then somewhere. "Translingual" wouldn't mean it appears in all languages, only in several of them. I could also define Nikita as an English romanization. But then we'd need an entry for every romanized language: Nikita in Finnish, Swedish, French, Kiswahili... or could we make a rule that only English romanizations are allowed? Maybe I'll just call them English and worry about it later. --Makaokalani 12:04, 29 October 2008 (UTC)

Verb forms

I'm eager to add some Czech verb forms here, but I don't know which way to do it. I had a look at the entries in Category:Spanish verb forms, Category:Portuguese verb forms, Category:French verb forms, Category:Latin verb forms, Category:Finnish verb forms, and Category:Italian verb forms, to see which looked nicer, and I think that the Portuguese, Spanish, Latin and Finnish verb-form entries are the best, all nicely templated. If I want to add these Czech verb forms, which style would you recommend that I use. --Ro-manB 17:16, 27 October 2008 (UTC)

Also, most of these were added by a bot - who can use their bot for Czech entries? --Ro-manB 17:16, 27 October 2008 (UTC)
When you're looking at formats, just remember that the Spanish verb forms have become over categorized. Ideally we'd have them all under one category- just Spanish verb forms. So if you add forms, try to keep the categorization simple. Nadando 23:35, 27 October 2008 (UTC)
There are already some Czech conjugation templates. I don't have many resources on Czech grammar, so I can't judge them properly. However, if they are correct (for example, see nést), there are only three tenses (past, present and future), two numbers (singular and plural) and three persons (first, second and third). That sounds fine for me, if you want to create one category for each of the 18 possible forms, or just keep all of them at Category:Czech verb forms. If the conjugation templates are incorrect or lacking important details, I suggest you first edit them. Then, you could create entries for verb forms using the {{form of}} template, as explained at Wiktionary:About Czech; or create a new template specifically for Czech entries that should generate simple definitions (e.g. "First person singular present tense of doufat.") and add the entry to the corresponding categories. Daniel. 01:48, 28 October 2008 (UTC)
Re: over-categorization: Sorry, I think that was my bad. I've never understood the point of these ginormous categories, and apparently I didn't even understand how they were supposed to work. —RuakhTALK 15:11, 28 October 2008 (UTC)
I think that the point of these ginormous categories is simply label an entry; generate a small text (Galician verb forms | Portuguese verb forms | Spanish verb forms | Swedish nouns) that ought to summarize all languages and parts of speech involved. This is good for one reading an entry, but I don't think that this user will ever click on these category links and find much more information than "Wiktionary has 186,484 entries for Italian verb forms up to date". If someone is, for example, studying Spanish conjugation, he or she could want to see a category full of entries ending in -ríamos. The problem with the actual Category:Spanish verb forms is: this user probably won't find anything useful easily either. There are many categories, often describing parts of more subcategories leading to a main fully-detailed one, as in: Spanish first-person verb forms > Spanish first-person future verb forms > Spanish first-person future subjunctive verb forms > Spanish first-person singular future subjunctive verb forms (compare with this: Spanish first-person forms > Spanish first-person future forms > Spanish first-person singular future indicative forms). There are also many discrepancies between them, like "Spanish verb forms"/"Spanish:Conjugated verb forms" and "Spanish present participles"/"Spanish gerunds", but even if all categories matched completely, this still doesn't seem to me a good system: We would need 71 categories just for the indicative persons, numbers and tenses (including conditional); if the subjunctive mood is included, the number would be 201, and so on. And, to prevent over-subcategorization (e.g., a category "Spanish present verb forms" pointing directly to all the present-related categories), there would be up to four empty categories leading to every populated category. In my opinion, a better way to fix this situation would be to include all populated categories (Spanish first-person future indicative...) directly on Category:Spanish verb forms, i.e., to delete all its empty subcategories; or do as it is done for Category:Portuguese verb forms, i.e., to organize them just by mood, number and person. Daniel. 04:43, 4 November 2008 (UTC)
I disagree. No one studying verb forms would find a category full of "entries ending in -ríamos" helpful. Our usual policy is not to have any of these subcategories, and to just have Category:Spanish verb forms. Spanish verb forms are over-categorized as a result of an early bot run that resulted in many incorrect and badly formatted entries that we are still cleaning up many years later. --EncycloPetey 08:07, 4 November 2008 (UTC)
Most of these entries are automatically included on the categories by templates, including Spanish informal second-person plural conditional forms of -ar verbs and more. So I'd like to ask how can you editors are cleaning badly formatted entries, if the source of the problem is still there; instead, I will focus in the main problem of incorrectness. After seeing some dozens of entries, most seem correct, but not uniform (Why some third-person forms are labeled in separate definitions as dialects used only with "usted", and others are a single definition "also used with usted"? Why some imperative forms are lacking when identical to the present subjunctive, others not?), then please someone say if there is any policy for how to format all Spanish conjugation entries. As for the usual policy [...] not to have any of these subcategories, this applies just to languages with more than twenty variants? If not, Category:English simple past forms and Category:English archaic third-person singular forms should not exist either. Daniel. 15:08, 9 November 2008 (UTC)

Wiktionary:Votes/sy-2008-10/User:Gauss for admin

RuakhTALK 22:04, 28 October 2008 (UTC)