Open main menu
discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← February 2013 · March 2013 · April 2013 → · (current)


Help with German verbsEdit

There are now more than 200 verbs in Category:German verbs needing inflection. Adding these inflection tables is beyond my limited knowledge of German. Please help, if you can. SemperBlotto (talk) 10:13, 1 March 2013 (UTC)

If the German templates work anything like the Dutch ones, you could probably do most of them by starting off with the base verb and specifying a parameter for the separable part. For example, abbremsen inflects the same as bremsen, with the additional part ab. So it would be: {{de-conj-weak|brems|gebremst|h||a|ab}}CodeCat 14:41, 1 March 2013 (UTC)
But if I make mistakes in German, I get moaned at. I'll leave it to my betters. SemperBlotto (talk) 08:20, 2 March 2013 (UTC)
The German conjugation templates are pretty terrible at the moment. Their syntax should be simplified when (if) they get converted to Lua. -- Liliana 09:28, 2 March 2013 (UTC)
Agree with Liliana. The German conjugation templates are a mess, and the declension templates are hardly better; converting them to Lua seems like the perfect opportunity to make them user-friendly rather than the terrifying, baffling quagmire they currently are. —Angr 09:32, 2 March 2013 (UTC)
Unfortunately I am completely clueless on how Lua works, else I would have started a draft long ago. -- Liliana 13:50, 2 March 2013 (UTC)
I may convert them some time in the future, using the current Dutch templates as a base. Could you read Template:nl-conj-wk/doc and list anything that doesn't apply to German, as well as extra things that are needed for German that don't apply to Dutch? —CodeCat 14:13, 2 March 2013 (UTC)

What I'd wish for is that the syntax is consistent for all four conjugation templates we have for German, that is {{de-conj-weak}}, {{de-conj-strong}}, {{de-conj-weak-eln}} and {{de-conj-weak-ern}}. With Lua, we could probably do away with the two parameters that just set whether the stem ends in d/t (like in Dutch) or s/z/ß (doesn't exist in Dutch, affects forms like second person singular), because they can be automatically derived from the stem. Verbs like laden seem to be completely irregular in German and cannot be conjugated by regular means, they should probably fall back to the irregular template (unless you have a better idea?).

Thus, the first three parameters should be 1. the stem, 2. the past participle (completely irregular in German, cannot be derived from the verb at all) and the auxiliary verb. The separable parameter should probably be named sep= like in the Dutch template. Non-separable prefixes are treated in our German templates as if they were part of the stem, dunno if it needs to be changed.

About strong verbs we need to think some more, because that's pretty difficult and widely different for every verb. We also need to see how to integrate {{de-conj-pp}}, the oddball one out of the German conjugation templates, into the whole thing. I guess I need to analyze German verbs for that to see how they conjugate. -- Liliana 14:56, 2 March 2013 (UTC)

The non-separable prefixes have a parameter because the script needs to know when to leave out the ge- of the past participle. I know that this happens in German too, so the German templates would also get such a parameter. It also has another advantage, too. The way the Dutch script currently works, it first conjugates the base verb, and only after that it adds the prefix and separable part to it. This means that if the script is coded to handle verbs like laden, then with a separate prefix parameter the exact same code could also handle entladen and all other verbs that have -laden, without any changes at all.
The German equivalent of {{nl-conj-wk-cht}} is the "Rückumlaut" verbs (they have mostly become strong in Dutch), but they are handled more or less the same, and should probably get their own template/function in the script.
Strong verbs in Dutch currently take three required parameters: present stem, past stem and past participle stem (without ge-). I think that this can be applied to German as well, but German also has umlaut in the present 2nd and 3rd singular, and I think also in the past subjunctive, so this may need two extra parameters.
I think that the verbs in -eln and -ern could probably be unified with regular weak verbs. Is this special type of inflection automatic in German, meaning that any verb with a stem in -er- automatically inflects that way?
The script has allowed us to drastically reduce the amount of verbs that need special treatment in Dutch. gaan and doen, which are otherwise pretty irregular, can be handled by the same script that handles all other strong verbs without any extras. A few verbs have specific but minor irregularities, like zeggen (with two past tense stems) and houden (with an alternate 1st singular), but these are treated as regular weak/strong and the script simply "knows" that those verbs need slightly different treatment. Only the preterite-present verbs (which are too irregular compared to each other to be all treated together) and a few other verbs (hebben, zijn, willen) are treated individually as simply irregular. So how irregular is laden exactly? It is a strong verb both currently and etymologically, so in what way does it differ from most other strong verbs? —CodeCat 15:52, 2 March 2013 (UTC)
Ah, in German templates past participles are currently set up a bit differently, though I guess this could be changed if desired.
Yes, German has umlaut in 2nd and 3rd person singular present, and also in the past subjunctive. A few verbs seem to have two possible past subjunctive forms like heben, though I don't know why that is the case. (Maybe one is an obsolete form and one the modern one?)
As for -eln and -ern, I don't think they can be determined automatically. There are a few -en verbs which have stems ending in -el or -er, like verlieren.
About laden, the problem is with the suffixes. Normally, as a verb with a stem ending in -d, it would have forms like *lädest and *lädet, but this is not the case here, so the correct forms are lädst and lädt... except for the second-person plural present. That is ladet, not *ladt as expected. This is the reason why this verb is fully irregular. -- Liliana 16:05, 2 March 2013 (UTC)
Maybe I should have been more specific. Do all verbs that have unstressed -er or -el preceded by a consonant inflect this way? The script could check to make sure that it only applies it to verbs ending in vowel + 1 or more consonants + el/r. —CodeCat 16:22, 2 March 2013 (UTC)

One thing that should also be added to the conjugation templates is this. Longtrend (talk) 17:53, 6 March 2013 (UTC)

I added some more conjugations. Many of the entries that are still in the category should not be there. This applies to all entries of the form "[prefix]zu[lemma form]". Those are special infinite forms (zu-infinitives) that shouldn't get their own conjugation tables but rather be included in the conjugation tables of their lemma forms. (For example, abzuscheiden is the zu-infinitive of abscheiden and should be included in the latter's conjugation table. Unfortunately, there's no way to do this ATM.) Longtrend (talk) 20:41, 31 March 2013 (UTC)

Before doing anything automated, note that there are lemma forms that look like "[prefix]zu[lemma form]" but aren't. For example, hinzufügen is a lemma form, not a zu-infinitive of a nonexistent *hinfügen. (Its zu-infinitive is hinzuzufügen.) —Angr 20:53, 31 March 2013 (UTC)
Actually, just to make things interesting, hinfügen has always been very rare, but is an attested word. Your point is still very valid, though. - -sche (discuss) 21:24, 31 March 2013 (UTC)
Thanks, I didn't know about hinfügen. After writing the above, though, it did occur to me that there is a contrast between the lemma form hinˈzukommen and the zu-infinitive ˈhinzukommen. —Angr 08:51, 1 April 2013 (UTC)
I think any effort to convert a template should start with replicating what the template does. Only once we're sure that works should we start to add new things. —CodeCat 21:42, 31 March 2013 (UTC)
Good point about hinzufügen, Angr. However, I don't think there's a need to do anything automatically. IMO we should just change the verb conjugation templates such as {{de-conj-weak}} so they show the zu-infinitive. However, it should only be linked for separable verbs, because the zu-infinitive of inseparable or simplex verbs is just the two words "zu [lemma]". Also, {{de-verb form of}} should be supplemented with an option to label the zu-infinitive. Is there anyone who could do that? Longtrend (talk) 11:08, 1 April 2013 (UTC)


I started a new page WT:CSS, because it was pointed out that our main style sheet is not documented. Michael Z. 2013-03-01 21:34 z

Thanks. I found it useful already, directing me to the commonprint stylesheet which accounts for the printing of urls (though it doesn't account for why full urls are needed in index boxes). DCDuring TALK 13:12, 25 March 2013 (UTC)
You’re welcome. I’d forgotten how frustrated I had been with understanding the style sheets, until I realized others were too. Michael Z. 2013-03-25 14:51 z
It is motivating me to learn enough CSS so I can inject it (possibly inappropriately), but, one hopes, be a better judge of what can be done by someone with better CSS-fu than I'm likely to ever have. DCDuring TALK 15:18, 25 March 2013 (UTC)

Egyptian quotationsEdit

I'm interested in adding quotations to, and generally improving, Egyptian (hieroglyphic) words. Is there any policy about how Egyptian quotations should be formatted - i.e. heiroglyph vs. transliteration; how to reference stelae and their authors? I did a short test run at 𓍁𓈖𓏭𓆱 - is that kinda how it should go? Hyarmendacil (talk) 03:48, 3 March 2013 (UTC)

In general yes. You may wish to create templates for referencing specific works (see Category:Latin quotation templates for examples). DTLHS (talk) 07:29, 3 March 2013 (UTC)
Also add hieroglypic representation of the citation if it's not the problem, with the cited part bolded in wiki-syntax (even if it's not bolded in actual font in the browser).
Those abbreviations seem a bit opaque - I'm sure they're well-known for your average Egyptologist, but not necessarily so for everyone else. E.g. for Sanskrit there is {{sa-a}} template for such purpuse which accepts common abbreviation as a parameter and links to the respective appendix page where the abbreviation is further explained. How it's used in practice in combination with inline citations see e.g. in the entry for सूनु (sūnu) (the very first definition). --Ivan Štambuk (talk) 19:31, 3 March 2013 (UTC)
It is probably best to enclose the quote in {{lang|egy}}, so that the appropriate language-specific fonts and styling are used. —CodeCat 19:43, 3 March 2013 (UTC)
Not just the abbreviations are opaque: uncommon words like electrum should be wikilinked. Chuck Entz (talk) 03:07, 4 March 2013 (UTC)
Even in citations? I wouldn't mess with hyperlinks to words in citations, but I don't see any reason to encourage them.--Prosfilaes (talk) 05:08, 4 March 2013 (UTC)
Why not? I'm not being flippant, I really don't know why having links in any chunk of text, citation or otherwise, would be discouraged. -- Eiríkr Útlendi │ Tala við mig 05:23, 4 March 2013 (UTC)
Because in "and my staff in ebony decorated with electrum", wikilinking electrum is highlighting something whose meaning is irrelevant to the entry 𓍁𓈖𓏭𓆱. Clicking that link won't explain 𓍁𓈖𓏭𓆱 in any way.--Prosfilaes (talk) 06:33, 4 March 2013 (UTC)
Perhaps not in this case, but quotes provide context, and you're missing some of the context if you don't understand the other words. It's not the main purpose of the quotes, but sometimes it's helpful. Chuck Entz (talk) 06:49, 4 March 2013 (UTC)
I think wikilinking is a rather minor issue really. Ivan Štambuk: The abbreviations are a standardised form of Unicode-transliteration - like in the Sanskrit quotation you linked: yes, they're opaque to non-ititiates, but trying to read the hieroglyphs is worse. Eqyptian is an opaque language however you write it, and transliteration is almost always a prerequisite to translation. I do agree that the quotation would benefit from the hieroglyphs, but entering these is very effort-intensive, as it has to be done via Manual-Codage (see 𓍁𓈖𓏭𓆱 again). Finding all the determinatives in Gardiner is a real pain. Hyarmendacil (talk) 07:41, 4 March 2013 (UTC)
I think you misunderstand what he was referring to when he said "abbreviation". At least for me, it's the whole "Ity, BM EA 586" source that gives me no idea of what it's referring to.--Prosfilaes (talk) 09:52, 4 March 2013 (UTC)
Yes, I meant the Ity, BM EA 586 part. --Ivan Štambuk (talk) 17:03, 4 March 2013 (UTC)

Oh, sorry to misunderstand. Those are there because I wasn't really sure how to cite the work. The situation is not the same as in Sanskrit as I'm only working from assorted stelae and not something as well-documented as the Vedas (I don't have the text for the Book of Coming Forth but that would be really handy). "Ity" is the 'author' - or at least, commissioner, the actual scribe being anonymous. "BM EA 586" is the museum designation: British Museum, Egyptian Antiquities, 586. I do realise that it's rather a bad way to cite the work, but it's the only form of cataloguing there is. The only other thing would be just to call it 'the Stela of Ity'. We can't really call all the stele 'untitled' by 'anon.'. Does anyone have any ideas? And also, what about the dating of the works? BC numbers will only be approximate; how about dating by dynasty? Hyarmendacil (talk) 08:46, 5 March 2013 (UTC)

Why not write out "British Museum, Egyptian Antiquities, 586"?--Prosfilaes (talk) 11:34, 18 March 2013 (UTC)

French words present here and absent from fr.wiktEdit

You might have a look at fr:Utilisateur:Darkdadaah/Diff/en/2013-03, a large page with more than 14000 French words present here and absent from fr.wikt. I looked only at verb forms. Many feminine or plural past participles forms are defined here while I told my fr.wikt bot they don't exist, e.g. claudiquée. All of them should be checked very seriously, and many should probably be deleted. There are many other conjugation mistakes, e.g. caqueteraient. And there are a few verbs which simply don't exist in French, e.g. déphlogistiguer (but there are some verbs still missing at fr.wikt, too, e.g. surtitrer, and some verbs present in fr.wikt but not yet considered by my bot when the list was generated, e.g. contenir). Lmaltier (talk) 09:36, 3 March 2013 (UTC)

  • After deleting bad entries that are bot-generated, please correct the conjugation template of the base verb and add it to User:SemperBlottoBot/feedme so that the bot can have a second attempt. SemperBlotto (talk) 09:58, 3 March 2013 (UTC)
Many forms will simply be correct but fr.wikt doesn't have them yet. And we have a different CFI to them. @SemperBlotto {{fr-past participle}} as an intr parameter for intransitive. Mglovesfun (talk) 10:02, 3 March 2013 (UTC)
The French page is just too big to handle easily. Could someone generate a simple list of possibly bad verbs that I could check? SemperBlotto (talk) 08:07, 4 March 2013 (UTC)

Diacritics in French capital lettersEdit

In Edouard and other pages, we can read In traditional French orthography, capital letters do not take diacritics, so É becomes E. No, this is wrong. In French orthography, diacritics are kept (e.g. you can read on many French town halls the capitalized French motto LIBERTÉ, ÉGALITÉ, FRATERNITÉ). And typographers know this basic rule. However, diacritics on capitals were not available, and therefore not used, on traditional typewriters. Some newspapers or printed books may forget them too, if not carefully produced. Anyway, this is a typographic or technical issue, not an orthographic issue. Lmaltier (talk) 09:51, 3 March 2013 (UTC)

I wouldn't say I disagree so much as it's not what I was taught. I was taught it's an error to put the accent on a capital letter. I believe at least some people consider this to be the case so we can't blanket change all such instances (Egypte is another). Let's do some research before we change anything. Mglovesfun (talk) 09:56, 3 March 2013 (UTC)
Things may have changed since 2000, but traditionally this matter varies by country. At least in the 20th century, capitalized text in France normally did not have diacritics (exceptions: signage and monumental texts, such as signposts with town names often did have diacritics...the rule for dropping the diacritics was for texts in books, magazines, and brochures). In Canada, it was a different story...all capitalized texts in Canada retain the diacritics. Some longstanding typographical conventions began to change in the 1990s due to computerization, so maybe this rule is different now. —Stephen (Talk) 10:09, 3 March 2013 (UTC)
I asked a friend on Facebook because I wanted a non-Wiktionary opinion:
"On apprend qu'on ne met pas d'accents lorsque c'est une majuscule, mais de plus en plus de personnes vont te donner des mots avec accents meme lorsque c'est une majuscule. Les deux se valent."
"We learn that you don't put an accent on when it's a capital letter, but more and more people will put an accent on even when it's a capital letter. Both are ok." (My translation)
Mglovesfun (talk) 10:49, 3 March 2013 (UTC)
  • Looking through an assortment of French books in front of me, published variously in the last 15 years, all of them use diacritics with capital letters. Ƿidsiþ 10:55, 3 March 2013 (UTC)
  • Looking through various French books I have to hand, I find that there is a lot of variation. Two rules seem to be categorical in these books: (1) Ç is never written C; and (2) text in all-caps always retains all diacritics. In addition, I find two apparent tendencies: (3) the preposition À loses its accent particularly often; and (4) short snippets in front-matter and back-matter keep more diacritics than full paragraph-style prose. —RuakhTALK 18:17, 3 March 2013 (UTC)
  • I don't know French, myself, and haven't checked any for accent prevalence, but based on what everyone's written above I propose that we have an entry for whichever form is attested, and that if the accented spelling is attested and not rare then it (even if less common) be the primary entry (with the other a soft redirect).​—msh210 (talk) 05:50, 4 March 2013 (UTC)

The Académie Française tell us that we must put the accents on capitals. Believing that they're not mandatory is a common misconception. BanunterX (talk) 22:09, 27 April 2013 (UTC)

P.S.: Here's a list of pages that should be deleted. There's even a template. BanunterX (talk) 22:11, 27 April 2013 (UTC)
Are you saying that there are no written texts in existence that might use those spellings? Did they all just disappear when the spelling was changed? —CodeCat 22:30, 27 April 2013 (UTC)
The article doesn't even say they're mandatory. And of course these spellings exist. We don't delete obsolete spellings as soon as a newer spelling comes along. And these aren't even obsolete; they're still used. The fact that some people don't like them isn't irrelevant (hence the usage notes) but isn't relevant to keeping or deleting. Mglovesfun (talk) 09:54, 28 April 2013 (UTC)
After I post my comment I thought about it and now I totally agree that we shouldn't delete them. But keep in mind that the recommanded spelling is and has always been with diacritics. But since a lot of people believe we don't have to put the diacritics on, the articles should stay. BanunterX (talk) 14:40, 2 May 2013 (UTC)
The Académie Française link that people are doesn't actually say that accents are obligatory, but recommended (and for good reasons, in my opinion). Also it hasn't 'always been the case', unless you can find some evidence that it has been. Were they recommended in 1700 for example? By who? The Académie Française didn't exist yet. Mglovesfun (talk) 10:48, 3 May 2013 (UTC)

Enabling User:Yair rand/FindTrans.js by defaultEdit

Previous discussion: Wiktionary:Grease pit/2013/February#User:Yair rand/FindTrans.js

User:Yair rand/FindTrans.js is a script that, when a user searches for a word that is listed as a translation on an entry but we don't have an entry for, causes the search page to show "(word) is a (language) translation of (word) (gloss)". It's testable via WT:PREFS ("Show translations-listings of words without entries on the search page."). What do people think about enabling it by default? --Yair rand (talk) 05:05, 4 March 2013 (UTC)

If it isn't buggy then I see little harm (only slowness) and a good deal of benefit, so support. (I haven't tried it much and don't know whether it isn't buggy, but have no reason to think it is.)​—msh210 (talk) 06:40, 4 March 2013 (UTC)
  • Well, no one has objected to it in the couple weeks this thread has been open, so I've enabled the script by default. --Yair rand (talk) 22:50, 19 March 2013 (UTC)

There's no Chechen Swadesh list on Wiktionary!Edit

And what about other "North Caucasian languages" (= Circassian/Adyghe, Abhaz, Ingush, Avar, Lak, Lezgic languages)? THERE IS NOTHING! This is a kind of "DISCRIMINATION"!!! I wrote about that on User talk:Mglovesfun, Appendix talk:Swadesh lists, User talk:Stephen G. Brown pages... And I found a Chechen-English Dictionary in Latin alphabet (including English-Chechen) ; but they are saying: "All the words must be written in Cyrillic!", etc. (If they want, they can find a Chechen dictionary with Cyrillic script... and they can use my dictionary as "Romanized Chechen".) Regards, Böri (talk) 13:31, 4 March 2013 (UTC)

This is a wiki. Content is written by volunteers. We don't have volunteers for the Chechen language, so no one has created a Swadesh list for Chechen. --Vahag (talk) 14:04, 4 March 2013 (UTC)
I want to write the Chechen list in Latin alphabet... but they said: "All the words must be written in Cyrillic!" and I said "Then you find it!" and "The Chechen people are NOT a Slavic people", etc. Böri (talk) 14:33, 4 March 2013 (UTC)
What does Chechen being not Slavic have to do with it? —CodeCat 15:45, 4 March 2013 (UTC)
The Cyrillic script is not their own script. (I wanted to write a Swadesh list for Chechen in Latin alphabet... but they said: "All the words must be written in Cyrillic!", etc.) Böri (talk) 15:51, 4 March 2013 (UTC)
Who are the "they" you mention here? Did you do this at Wikipedia? Wiktionary does not have any such requirements, so far as I understand it. The bottom of the Appendix:Swadesh_lists#Assorted_Swadesh_lists box explicitly states, "The list may also include: (t) the transcription in Latin characters; (p) the phonetic transcription".
In other words, I don't think you'll encounter much opposition here at Wiktionary, if you decide to create a Swadesh list for Chechen that uses the Latin alphabet. Go right ahead. -- Eiríkr Útlendi │ Tala við mig 16:25, 4 March 2013 (UTC)
Amended above upon reading more.
Böri, it looks like the table could include the Latin-alphabet transcription as one of the columns, but it should also include a main column using the script most often used for the language in question. For Arabic languages, editors are expected to use the Arabic script. For Korean, Hangul. For Inuktitut, Canadian Syllabics. So for Chechen, the main entry column should be in Cyrillic, as this is the script most often used to write Chechen.
This has nothing to do with discrimination, at least not on the part of Wiktionary. Wiktionary aims to be descriptive -- describing how things are -- rather than prescriptive -- describing how things should be. -- Eiríkr Útlendi │ Tala við mig 16:33, 4 March 2013 (UTC)
Wikipedia says that Chechen was historically written in Cyrillic only, but it recently started using the Latin alphabet. I think that may be a misunderstanding then. Whoever removed your work probably didn't realise it. So Cyrillic isn't their own script now but it still used to be. (And aside from that, many non-Slavic languages are written in Cyrillic. Many non-Slavic minority languages in Russia are, as are Mongol, Kazach, etc. —CodeCat 16:29, 4 March 2013 (UTC)
All languages of Russia are written in Cyrillic, no exceptions. -- Liliana 17:51, 4 March 2013 (UTC)
From Wikipedia, I gather that it is currently written in both. And although Cyrillic is official, I very much doubt that it's the only script in use, considering the politically tense situation that still exists there. I imagine that since Cyrillic is the hallmark of Russia (the "oppressor" according to nationalists), they probably use Latin as a symbol of independence. So most likely, Latin is still being used unofficially, like Böri indicates. —CodeCat 17:57, 4 March 2013 (UTC)
Apparently I missed this bit: The choice of alphabet in Chechen is politically significant (as Russia prefers the use of the Cyrillic script, against the separatists' preference for Latin).CodeCat 17:58, 4 March 2013 (UTC)
  • @Liliana, that statement might need some qualification -- Korean and Chinese are spoken in parts of Russia, and those two at least are generally not written in Cyrillic. Perhaps your comment is limited to the official languages of Russia? -- Eiríkr Útlendi │ Tala við mig 18:03, 4 March 2013 (UTC)
Not just that, it also applies to languages without official treatment. Chinese, as in the Dungan variety, is actually written in Cyrillic. -- Liliana 18:54, 4 March 2013 (UTC)
There is one more exception - Karelian. It's written in Roman letters but its written form is almost non-existent. I don't think there are any complete Karelian-other language dictionaries.
Böri, I'm Russian. If you think that we discriminate Chechens and the Chechen language, please add some Chechen contents. There is no discrimination here, only not enough contributors for any language. You can't demand anything.
If you create entries in the script, which is not used by majority of Chechens, by Chechen media but only by some nationalists abroad, it's useless. The Cyrillic spelling for Chechen is official and wide-spread, it's used by the Chechen government, media and people. The Roman spelling for Chechen is unregulated, it's more like chat script with all possible variations. Sorry, not interested in long talks. If you want to make a difference, do some work, if you want to make a point, join some Wikipedia discussion, we work with languages as they are used, not with politics. Any language is welcome, including Chechen but it has to be right. --Anatoli (обсудить/вклад) 21:33, 4 March 2013 (UTC)
Chechen-Russian Dictionary in Cyrillic script: If you want, you can make a list. Böri (talk) 08:50, 5 March 2013 (UTC)
The list could be created by a person who knows the language, at least to some extent. Do you know Chechen? If you want, I can create a template, where one just need to add words. There are a few conditions - the words need to be in the lemma form - a dictionary form, they have to be in the correct script - Cyrillic is correct for Chechen. The dictionary above is one-sided. One can't find translation from English into Chechen. We have no policy on the Chechen language, simply because we never had Chechens. Take a look at these Category:Chechen_nouns, Category:Chechen_pronouns and Category:Chechen_adjectives, though. Here's another list I could find:
For transliteration, I would use Modern Latin, as in w:Chechen_language#Alphabets

--Anatoli (обсудить/вклад) 10:18, 5 March 2013 (UTC)

Böri, I just see a total failure to understand, either a failure or a refusal to understand. If we take a valid word we don't have, say hoyau (French). We don't not have it because we're discriminating against French for political reasons, just nobody's created it. To be honest I think you're perfectly capable of understanding, but it's harder for you to make a political point if you understand what we're saying to you, so you're deliberately ignoring us. Mglovesfun (talk) 11:08, 5 March 2013 (UTC)
Anatoli's list is good! Böri (talk) 12:25, 5 March 2013 (UTC)

Putting large tables into subpages?Edit

I can't remember if this has been suggested/discussed before. It would help pages to load faster if large tables (mostly conjugations and translations) were put into subpages. Other wiktionaries do this (French, Italian, German and maybe more). I have experimented at parlare. We might have to think of a naming convention - parlare has two language sections with conjugations and I have only moved the Italian one. Any thoughts? SemperBlotto (talk) 19:34, 4 March 2013 (UTC)

Poking around the FR WT, the tables on those separate pages load lickety-split. Experimenting in the Edit view with transcluding those table pages into the main entry page also shows very fast load times. This makes me think that just having them on the main page wouldn't change performance much. This also makes me think that any pages we have here on the EN WT that load slowly due to tables might be because we've gotten overly fancy, or underly optimized.
(I also note that the FR WT wiki code is quite elegant indeed, and makes much use of transclusion -- though I haven't dug beyond the first layer of transclusion so far.) -- Eiríkr Útlendi │ Tala við mig 20:25, 4 March 2013 (UTC)
Since nobody is particularly bothered, one way or the other, I shall put it back the way it was. SemperBlotto (talk) 10:54, 9 March 2013 (UTC)
I think the status quo is okay. Inflection tables seem unlikely to take too much time to load, anyway. What I expect to take longer time to load are extensive translation tables. --Dan Polansky (talk) 11:00, 9 March 2013 (UTC)
I dunno, does manual inflection like prosum or memini take considerably longer to load for you? —Μετάknowledgediscuss/deeds 17:30, 9 March 2013 (UTC)

Japanese On readingsEdit

"Katakana are used to indicate the on'yomi (Chinese-derived readings) of a kanji in a kanji dictionary." Katakana - Wikipedia
"Generally, On-yomi are written in CAPITAL LETTERS (katakana in kana), and Kun-yomi are written in lowercase letters (hiragana in kana), the On-yomi usually having some sort of break to separate the kanji from the extra kana involved (called okurigana (おくりがな))." On-yomi and Kun-yomi (Romaji version) -
"On-yomi, on the other hand, is mostly used for words that originate from Chinese, which often use 2 or more Kanji. For that reason, on-yomi is often written in Katakana." Kanji - Learn Japanese

The on readings of kanji should be in katakana(カタカナ), not hiragana(ひらがな). Hiragana is for native Japanese and katakana is for loan-words. On readings come from China, while the kun readings are native to Japan. Besides the underlying reason why on-yomi should be written in katakana, there are many other sites (in particular Japanese domain sites) that follow this guideline. Even on sites and applications that don't use kana, the onyomi reading is written in uppercase while the kunyomi is written in lowercase, further distinguishing the two types of readings. The following sites should provide ample proof: Denshi Jisho, WWWJDIC, goo.

As for how to correct this mistake, most of the on readings could be corrected with a bot script that targets "On: " in Japanese-related pages and changes hiragana characters to katakana. As for other instances where on-readings are used, those may need to be manually changed. -- 20:34, 5 March 2013 (UTC)

Who are you and why does this matter to you? The above appears to be your sole post.
FWIW, my copy of Shogakukan's 国語大辞典 (Kokugo Dai Jiten) uses hiragana for all listings except 外来語 (gairaigo, foreign loanwords). Same for my copy of 大辞林 (Daijirin). Same for my copy of the 新明解国語辞典 (Shin Meikai Kokugo Jiten). I would hardly classify this use of hiragana as a "mistake".
Whether to use hira- or katakana appears to be convention, and not a hard-and-fast rule. As such, I'm perfectly happy not undertaking the major task of changing all on'yomi from one to the other, especially when it would bring about no real change in the site's usability.
If another editor decides that doing so would be a good use of their time, I don't think I'd necessarily be opposed. However, when there are so many other more important and useful things to do, I wonder why on'yomi kana would be an issue. -- Eiríkr Útlendi │ Tala við mig 22:20, 5 March 2013 (UTC)
... Same for Kenkyūsha Online (paywalled). Same for Eijiro when searching E>J and yomigana display is enabled. Same for 世界大百科事典 (Sekai Dai Hyakka Jiten), or the デジタル大辞泉 (Dejitaru Daijisen), or the 百科事典マイペディア (Hyakka Jiten Maipedia), all as shown for example on the 鼎 entry at Kotobank... -- Eiríkr Útlendi │ Tala við mig 18:23, 6 March 2013 (UTC)
I'm the OP. Sorry about that, I previously didn't have a reason to make an account and the username I preferred had already been taken on a different linked service. I am not a native of Japan or Japanese in any way. I simply dislike inconsistencies.
I have come to realize that I might have been slightly overzealous. The only time that it really needs changing is when it's specifically stated as on-yomi to show that the it is not originally native to Japan, such as on pages like . Outside of that context, hiragana is indeed used for readings most of the time as, even though those readings have a foreign origin, they are not considered foreign words (such as アイスクリーム). From what I understand, katakana is similar to italicizing in English in some respects. --Soardra (talk) 06:45, 17 March 2013 (UTC)
Hello Soardra, thank you for clarifying. In just about every dictionary I've looked at, readings considered to be on'yomi are written in hiragana. The only sense on the entry that might take katakana would be for reading ī, coming from modern Mandarin () -- which we don't have yet. Notably, this reading is considered to be a 外来語 (gairaigo, foreign loan word), and thus naturally takes katakana. Was there something else on the entry that you had in mind? -- Eiríkr Útlendi │ Tala við mig 07:24, 17 March 2013 (UTC)
After some thought, I'd like to clarify what I meant in my previous statement. I believe that katakana should be use for on readings only as they appear in the Readings heading of Kanji entries to better distinguish the origin of the reading. --Soardra (talk) 14:02, 6 May 2013 (UTC)
  • That's certainly a more limited scope (and thus easier to implement), but 1) the sizable corpus of existing kanji entries all use hiragana for the ===Readings=== section, so changing this would be non-trivial; 2) these hiragana spellings all link to the corresponding hiragana entries, so changing all of these to katakana would again be non-trivial (though I suppose you could have katakana as the displayed script, and actually link to the hiragana entries, but that would be confusing); and 3) the readings section already clearly indicates (or at least should do so, if properly formatted) which readings are on'yomi, which are kun'yomi, etc.
I somewhat understand your desire for using katakana for on'yomi, but I don't think it's that important, and making the change would be a considerable amount of work, for no notable gain in usability. -- Eiríkr Útlendi │ Tala við mig 17:15, 6 May 2013 (UTC)

Whitelist Choor monsterEdit

Because I'm a naughty boy I can't edit WT:WL any more, but it's definitely about time that User:Choor monster stopped having to have all of his/her edits approved, right? Equinox 22:16, 5 March 2013 (UTC)

I nominated him in your name. — Ungoliant (Falai) 12:21, 9 March 2013 (UTC)


Is anyone using the .gender-period class to show the hidden periods after abbreviated gender templates? For example, changing “m pl” to “m. pl.

If so, please change your code to the following:

/* add a period after abbreviated genders and numbers */
abbr.gender:after, abbr.number:after { content: '.'; }

This will work without any extra code in the relevant templates, which I will remove shortly. Previous discussion was at Wiktionary:Grease_pit/2013/February#How should Category:Gender and number templates be converted to Lua?Michael Z. 2013-03-05 22:35 z

sh User languages - Bosnian, Croatian, Serbian merged into Serbo-CroatianEdit

I have moved the bs, hr, sr language user categories and templates to sh. Please update your Babel, if you're using one of the language templates more than once! --Anatoli (обсудить/вклад) 07:14, 8 March 2013 (UTC)

I'm not sure if that's a good idea. While we can at least have a consensus that we treat them as one language on Wiktionary, it goes a bit far to expect everyone else to declare their own native language as such. We can at least accommodate them by allowing them to choose. Ivan Štambuk for example has both "sh" and "hr" on his page, even though he was one of the main proponents of merging them. —CodeCat 13:35, 8 March 2013 (UTC)
Ivan Štambuk told me he had both because he wanted to make clear that he was a native of Croatian variant of sh. I followed his example and put two on my Babel also: "sr" and "sh". I don't have any strong feelings about this merging, but I think it is a good idea to have an option of labeling somehow which variant you natively speak. Maybe making a subtemplate of some kind. --biblbroksдискашн 19:46, 9 March 2013 (UTC)
I think that was already done some time ago, but it never really got off the ground and I don't remember exactly how it ended up being done. —CodeCat 19:53, 9 March 2013 (UTC)
Does an American who never heard New Zealand, Australian, etc. accent really need to mark it specifically? Marking {{Babel-8|Cyrl}} may suffice to let users know that one knows or doesn't know Cyrillic. Serbs and Montenegrins are comfortable with Roman letters, Croatian and Bosnians are less comfortable with Cyrillic. I think it doesn't make sense to treat the knowledge of Serbo-Croatian varieties as the knowledge of different languages if we decided that we treat them as one language. Arguments still happen around words like hleb/hljeb/kruh/kruv or šta/kaj but Serbo-Croatian speakers may prefer to decide whether they want to reveal their origin or just claim to know Serbo-Croatian. --Anatoli (обсудить/вклад) 08:54, 10 March 2013 (UTC)
I’ve been watching WT:FB for a while, and the amount of Americans who complained about New Zealand English and Australian English being considered the same language as American English is zero. On the other hand, there were hundreds who complained about the Serbo-Croatian merger. Obviously this is a much more sensitive issue than English. Merging the Babel templates will do no good at all; all it will do it piss off contributors who consider Croation/Serbian/etc. to be various languages. We can force people to edit Serbo-Croatian as one language, but we can’t force them to believe it is one language. — Ungoliant (Falai) 13:20, 10 March 2013 (UTC)
Agreed. User pages don't need to be coerced to the same standards main space is.--Prosfilaes (talk) 08:08, 11 March 2013 (UTC)

FWOTD is lacking variety and quoted entriesEdit

FWOTD currently doesn't have much variety, and many of the listed nominations are still lacking quotations and/or pronunciation, which is making it hard to keep it going. If anyone could help out by adding quotations and pronunciations to the nominations, and by nominating new words with both as well, that would be very useful. —CodeCat 15:16, 8 March 2013 (UTC)

Bot to clean the Wiktionary:Sandbox and Template:SandboxEdit

Hello everyone, we need a bot cleaning Wiktionary:Sandbox and Template:Sandbox but my bot request to run one was closed on the basis that a discussion should occur deciding how often the bot should clean the sandbox. The relevant BRFA can be found here. Normally, my sandbot runs every hour but here, it seems something closer to six hours is more appropriate (IMHO anyways). What are your thoughts? -Riley Huntley (SWMT) 08:16, 9 March 2013 (UTC)

It shouldn't be cleaned too often, because we don't want to wash away someone's sandcastles. If it is cleaned more, then there is also a higher chance that the cleanup will accidentally wipe away something that someone is working on. —CodeCat 10:16, 9 March 2013 (UTC)
I agree. No more often than once per day. SemperBlotto (talk) 10:51, 9 March 2013 (UTC)
Okay, once a day is reasonable. Also, the "sandcastles" aren't just washed away, a user can always retrieve there information from the history. There is also a warning on the page for this reason; "Any content added to this page may be deleted in twelve hours or less. Do not use this page for anything that you want to keep." -Riley Huntley (SWMT) 20:29, 9 March 2013 (UTC)
Around four hours is ideal IMO, but I’ll support the bot for any period >= one hour. — Ungoliant (Falai) 20:47, 9 March 2013 (UTC)

The page history is potentially irrelevant to someone who’s just learning wikitext.

Does the bot leave the sandbox alone if someone has edited it in the last hour? There should be some minimum idle period that a new user should have to keep messing with the sandbox. Wiping it out while they are refreshing the page would just be a discouraging prank.

And if the sandbox is wiped out every six hours, then the message should be updated to reflect the reality. Michael Z. 2013-03-09 21:23 z

I agree (w/Michael Z.). —RuakhTALK 00:51, 10 March 2013 (UTC)
  • The bot checks to see if the page has been edited in the last 15 minutes. If it has been, it delays itself for another 15 minutes. -Riley Huntley (SWMT) 04:25, 10 March 2013 (UTC)
I really don't see the need for such a bot. But if others do, then how about cleaning daily but no less than two hours since the last edit? (And incidentally what's BRFA? I mean, I gather it means a bot-approval vote, but what is it supposed to stand for? It it some kind of enWP jargon?)​—msh210 (talk) 04:29, 10 March 2013 (UTC)
Re: parenthetical question: Yup, enWP jargon; w:WP:BRFA is w:Wikipedia:Bots/Requests for approval. —RuakhTALK 07:12, 10 March 2013 (UTC)

Looking at a few page’s of the sandbox’s history, I would suggest wait 30 minutes before cleaning. Most experimenters are editing for just a few minutes, but some spend 20–30 minutes there. Of course there’s no indication of whether they are then finished, or perhaps refer to the results a few minutes afterwards. Here’s a longer editing session, interruptedMichael Z. 2013-03-10 17:04 z

  • It is fine by me to extend the delay, we just all need to be able to decide on how long. :) -Riley Huntley (SWMT) 21:28, 10 March 2013 (UTC)

Discussion here has slightly stagnated, and our worthy Wikipedian seems anxious to run this bot — not to mention that a number of others also think it's a good idea. We have the following proposals on the table:

  • every 1 hour (proposition, vote)
  • every 6 hours (SGB, vote)
  • every 24 hours, no more often (SB, vote and here)
  • only if idle 2 hours (MZ, vote; myself, here)
  • every 4 hours (Ungoliant, here, who will support anything less often than Q 1 hour)
  • only if idle 0.5 hours (MZ, here)

Based on this, I think that a vote for the bot to empty out the Sandbox every 6, or every 24, hours, but no less than an hour since the last edit, will likely pass. (AFAICT now, I'll support either of those, myself, fwiw.) I recommend that you (Riley Huntley) start a vote with one of those options (or both as alternatives, if you like); of course, you can feel free to start any vote you like, if you read the above differently. Does anyone disagree with or want to voice some objection to my summary? Or, on another note, does anyone have a thought on the bot that has not already been voiced?​—msh210 (talk) 05:36, 12 March 2013 (UTC)

  • Works for me, although I am going on a vacation for a short while so I will have to start the vote when I am back. -Riley Huntley (SWMT) 23:26, 12 March 2013 (UTC)

Do we even need a sandbox?Edit

I just realised we can try a completely different approach. We're already in the habit of using a subpage of our own user page as a sandbox. So do we actually need a central sandbox? On our current sandbox page, we could provide a link to a user's own sandbox (Special:MyPage/sandbox) and then lock the page so that users can't accidentally edit it. How is that? —CodeCat 00:09, 14 March 2013 (UTC)
What about IPs? — Ungoliant (Falai) 00:14, 14 March 2013 (UTC)
Don't IPs have user pages? —CodeCat 00:25, 14 March 2013 (UTC)
I mean, won’t it be a waste of space to let IPs create their sandboxes? — Ungoliant (Falai) 00:31, 14 March 2013 (UTC)
I suppose so. Maybe we should keep the sandbox open for them then. But linking to a user's subpage is still a good idea I think. We could also add a message saying that getting such a page is one of the benefits of registering, so that you get to keep your sandbox edits indefinitely. —CodeCat 00:36, 14 March 2013 (UTC)
Good idea. But then that doesn't obviate the need (if there is one at all) for the bot proposed here, so this conversation is orthogonal to the one that started above.​—msh210 (talk) 06:30, 14 March 2013 (UTC)
Great idea, CodeCat.​—msh210 (talk) 06:30, 14 March 2013 (UTC)
I've split it off. This brings up new questions though. What is there consensus for: keeping it as-is, adding a notice encouraging registered users to use their own sandboxes, or closing the sandbox and directing even IPs to their IP-specific sandbox? —CodeCat 15:34, 14 March 2013 (UTC)
Horrible idea. It would leave terrabytes of crap all over the wiki. Just leave things as they are - it ain't broke, don't try to fix it. SemperBlotto (talk) 15:39, 14 March 2013 (UTC)
I don't think we 'need' a sandbox more than we 'need' WT:LOP, both are really just to attract vandals away from the main namespace. Mglovesfun (talk) 15:43, 14 March 2013 (UTC)
You don't even have to save in your own userspace half the time: I mostly use the temporary Preview function as a "sandbox". (That doesn't show a few things, though, like the category footer.) Equinox 20:59, 14 March 2013 (UTC)
Categories show on preview.​—msh210 (talk) 21:31, 14 March 2013 (UTC)
Keep old system per Semper. —Μετάknowledgediscuss/deeds 22:39, 14 March 2013 (UTC)

Wiktionary:Etymology scriptorium/2013/MarchEdit

As much as I would love to ignore KYPark (talkcontribs) and his voluminously pointless "contributions" to the Etymology scriptorium, I have trouble accepting his creation of a subpage for March and copying all the posts from this month to it. This has the effect of removing those threads from watchlists and possibly also interrupting their edit histories in violation of our licensing. Was anyone else aware of this? And what, if anything, should we do about it? Chuck Entz (talk) 06:51, 10 March 2013 (UTC)

Closer examination shows that all of the material was posted by KYPark, so the copyright issue is pretty marginal. It will be a bit of a mess when anyone else wants to post anything, though. Chuck Entz (talk) 07:46, 10 March 2013 (UTC)
I don't see how it's different from any other archiving. If the main ES page is on your watchlist, you must have noticed him removing 4861 K of text, and it's easy enough to add the March subpage to your watchlist. But it is frustrating that it's so difficult to have an actual discussion of etymology at ES because of his lengthy diatribes and discussions with himself. —Angr 16:29, 10 March 2013 (UTC)
I agree. I've tried to get through to him but he just doesn't seem to "get it". I don't really want to get more forceful with him but... what else can we do? —CodeCat 17:41, 10 March 2013 (UTC)
The only way I've ever gotten through to him was by making concrete threats. That evidently worked, because although much fewer of his edits are actually useful mainspace edits now, at least those that he does make are always good (at least in English and Korean, I can't judge the rest). However, if you see him doing something questionable (especially in etymologies), please tell me. —Μετάknowledgediscuss/deeds 17:47, 10 March 2013 (UTC)
So what are we going to do? —CodeCat 15:31, 14 March 2013 (UTC)
I propose that we move every similar topic he starts to User:KYPark/FOO (where FOO is the topic’s title) from now on. — Ungoliant (Falai) 15:51, 14 March 2013 (UTC)
Agreed. It would be nice if we could tell him not to start them in the ES in the first place, though. —CodeCat 15:53, 14 March 2013 (UTC)
I did tell him in the most recent discussion (WT:ES#dung beetle, see my first comment), but he answered with his typical poetaster discourse. Someone should tell him in his user page though. — Ungoliant (Falai) 16:02, 14 March 2013 (UTC)
If he could just post this stuff to a blog website and not here, that would be the ideal situation. Mglovesfun (talk) 16:30, 14 March 2013 (UTC)
But just what is "this stuff" exactly? I think most of us know what it is, but how do we explain it? We can't just say he should not post anything we think is more of "that stuff"... We need a more objective criterium so he also knows what we expect of him. —CodeCat 16:42, 14 March 2013 (UTC)
"This stuff" is speculative etymologies based on superficial similarities, whether semantic or phonological. He has stated that he is opposed to positivism, and his proposed etymologies reflect that, but he needs to accept that the vast majority of editors around here expect etymologies—and discussions at ES—to based on hard linguistic evidence, not introspection. —Angr 19:12, 14 March 2013 (UTC)
So "this stuff" is basically anything that lacks the scientific rigour we expect from etymological discussions, and which is therefore not of any concrete use for the dictionary? I suppose that since his posts aren't intended to improve Wiktionary's content directly (he should know how Wiktionary works by now), he's really just trying to stir up discussion, which falls under our "Wiktionary is not a forum" rule. We could probably use that as a justification to move his posts. —CodeCat 03:04, 15 March 2013 (UTC)

Template:bot ownerEdit

On the English Wikipedia, they have a template called Template:User bot owner. I say that we should import it, because it provides a standardised way to include very relevant information about a user's activities in an easily accessible format. A short earlier discussion on the topic is here. --Njardarlogar (talk) 08:43, 12 March 2013 (UTC)

Instead of import, just make a Wiktionary version. Mglovesfun (talk) 12:15, 12 March 2013 (UTC)
It's afoul of WT:UBV unless and until there's consensus allowing it.​—msh210 (talk) 14:41, 12 March 2013 (UTC)
So are the userboxes for script competence ({{User Latn}}, {{User Grek-4}}, {{User Cyrl-3}} etc.) and the userboxes for coding ability ({{User template-2}}, {{User Lua-0}}, etc.) —Angr 14:59, 12 March 2013 (UTC)
Uh, no. We had discussion and nobody complained, so it's basically consensus. UBV specifically allows exceptions when supported by consensus. —Μετάknowledgediscuss/deeds 23:46, 12 March 2013 (UTC)
I think the general consensus is that anything that's on your user page that supports your work here is ok. This user box fits that, so I support it and any other future proposals. I think a separate requirement just for userboxes is a bit silly, when it's the general idea that counts. —CodeCat 23:57, 12 March 2013 (UTC)
WT:NPOV still says, "Language-proficiency userboxes are encouraged, and may be added easily using {{Babel}}. All other userboxes are currently forbidden (though specific exceptions may be made, after discussion)." Nowhere does it say that script and coding userboxes have been deemed acceptable, nor does it provide a link to any discussion or vote permitting them. I don't personally oppose them (at least, no more than I oppose the language-proficiency userboxes, which I consider silly and which I only have on my user page because it was expected of me when I ran for admin), but as far as I can tell they are a violation of the letter, if not the spirit, of the law around here. —Angr 21:58, 13 March 2013 (UTC)
Yes the rules are out of date. I bet there's no appetite to update them either. If so, the status quo is the best solution. Mglovesfun (talk) 22:26, 13 March 2013 (UTC)
Lua is a language. The vote doesn’t specify that it must be human language. — Ungoliant (Falai) 11:23, 14 March 2013 (UTC)
Interesting. So if I were to create userboxes for all the different programming languages I'm proficient in, you don't think I'd be violating the policy? —RuakhTALK 15:23, 14 March 2013 (UTC)
I suppose it would be a violation of the spirit but not the letter of the law. I don't think Ungoliant MMDCCLXIV is suggesting we acually do it, just pointing out that the language used is ambiguous. Mglovesfun (talk) 15:28, 14 March 2013 (UTC)
When I created user boxes for coding, I just wondered "is this useful for Wiktionary?" And I think it is. —CodeCat 15:30, 14 March 2013 (UTC)
Not in violation of what’s written, no. But, IMO, coding proficiency userboxes (for the languages and whatnot we use here) are as important as the human language userboxes. — Ungoliant (Falai) 15:58, 14 March 2013 (UTC)

I've created it at Template:bot owner. --Njardarlogar (talk) 19:20, 29 March 2013 (UTC)

I've moved it to {{User bot owner}} because I think it's customary for user templates to begin with "user" to make them recognisable. —CodeCat 21:32, 29 March 2013 (UTC)
Yeah, it was a rather hasty import. I created the /core bit since I am lazy. It should do its job, though. --Njardarlogar (talk) 22:25, 29 March 2013 (UTC)

Apple touch iconEdit

I just discovered this: This is the image used when Wiktionary is saved as a home screen icon on an iOS device. Where the heck did that logo come from? Michael Z. 2013-03-14 05:24 z

That's really random, but honestly I like the look of it. No idea whence it came. —Μετάknowledgediscuss/deeds 22:42, 14 March 2013 (UTC)
Too cluttered for my taste: text on a background of text? Equinox 22:44, 14 March 2013 (UTC)
Anyone know how to go about replacing this weirdness with the newly-official favicon? I found a reference at mw:Manual:$wgAppleTouchIcon, but that doesn’t offer any help on how to upload a replacement file. Michael Z. 2013-03-14 23:16 z
I see. The documentation is quite clear; it says that the value can be a "relative path or [an] absolute URL". All we have to do is create a bug to let the devs know we want the same value as whatever our favicon has. I'll start the bug if nobody else does. —Μετάknowledgediscuss/deeds 00:19, 15 March 2013 (UTC)
Please do. But we also need to make appropriately-sized versions of the png file, and get them uploaded to that standard URL.[1] I’d be glad to prepare a file, but I don’t know where to find the logo. It’s not in commons:Category:Wiktionary_icons. All I can find is File:favicon.png. Is there one larger than 32×32 px at all? Michael Z. 2013-03-15 01:14 z
Can't we just blow up the 32x32 and reupload? —Μετάknowledgediscuss/deeds 01:40, 15 March 2013 (UTC)
It won’t look good. Here are specimens at the standard sizes. Michael Z. 2013-03-15 02:57 z
Is there no SVG version? —CodeCat 02:58, 15 March 2013 (UTC)
I'm not good with graphics, but we can go to Commons and get someone to retouch it for us. —Μετάknowledgediscuss/deeds 03:51, 15 March 2013 (UTC)
I was thinking of something like this. Of course when it is used, iOS will give it rounded corners and a glossy reflection. Michael Z. 2013-03-15 14:06 z
I like it! —Μετάknowledgediscuss/deeds 18:57, 15 March 2013 (UTC)

Filed bug 46431: Update Apple touch icon for en.wiktionary.orgMichael Z. 2013-03-21 19:35 z

Etymology of EntomologyEdit

Not listened to it meself yet, but (for the next two days) this BBC radio programme is available: [2]. Listening from outside the UK will require technical jiggery-pokery due to their restrictions. Equinox 20:21, 14 March 2013 (UTC)


I have created a categorisation system for modules which I ask all editors who may create modules to look at for future reference. The subcategories of Category:Modules may seem somewhat empty to you due to some modules not being listed yet (if you want, you can do a null edit to cause them to be categorised). Don't let that fool you — all non-experimental modules that currently exist are categorised in a subcategory, each which has a blurb explaining its function at the top. Categories are placed on /doc subpages (not /documentation, which is invalid).

If you have any problem with the categorisation system, now is the time to voice your ideas, before the number of modules grows unmanageable. —Μετάknowledgediscuss/deeds 05:20, 15 March 2013 (UTC)

What do you mean, /documentation is invalid? Also, I think it's better if modules are categorised together with templates. It doesn't make sense if templates are kept separate from the modules they use when they are really closely tied to them. Or had you intended to do both? —CodeCat 14:18, 15 March 2013 (UTC)
Re /documentation: The change hadn't happened yet, my mistake. More discussion at User talk:Metaknowledge#Could you move the documentation subpages back please?.
Re modules categorised with templates: They are. Take a look. —Μετάknowledgediscuss/deeds 18:56, 15 March 2013 (UTC)


Now that we've had WebFonts for a few months — what do people think of it? Do we want to keep it?

There's some discussion at Wiktionary:Grease pit/2013/February#English Main Page Has Started Crashing in Safari - May Be Font Download Problem of what appears to be a problem with it, but I think that how we proceed with that problem depends on how we feel about WebFonts in general. (Either way we'll presumably open a ticket, but the ticket can be "WebFonts has this problem that should be fixed", or it can be "Please remove WebFonts!")

RuakhTALK 07:26, 16 March 2013 (UTC)

I can't really say because I don't think it's actually ever been necessary for me. On the other hand, I have noticed that some Javascripts are replacing fonts while the page is loading, which looks a bit strange to me. —CodeCat 15:03, 16 March 2013 (UTC)
I wish these could be off by default and enabled selectively, preferably per browser.
On a desktop I am loading 1.6 MB of fonts which are absolutely unnecessary, because I already have fonts for all of the web fonts languages. On a mobile I presume that I am loading 1.6 MB of fonts, risking a significant increase in my monthly bill, and I still see boxes for some scripts.
Are there any options in how these are set up? Can readers have any control? Michael Z. 2013-03-16 15:27 z
Special:Preferences (Appearance tab, down the bottom) has an option to turn it off, or on. Ho finer control is possible unless ULS is installed. This, that and the other (talk) 10:49, 17 March 2013 (UTC)
ULS ??? DCDuring TALK 11:41, 17 March 2013 (UTC)
mw:ULS. --Yair rand (talk) 18:48, 17 March 2013 (UTC)
Thanks, Yair.
I don't know. --Dan Polansky (talk) 12:14, 17 March 2013 (UTC)
A very large part of our content is completely unusable for many/most users without WebFonts. I don't think removing it without a replacement is at all feasible. --Yair rand (talk) 18:48, 17 March 2013 (UTC)
What triggers a WebFonts download? Do they stay downloaded as long as one's computer is on, as one's browser, window, or tab is open?
If they occur when one first opens Wiktionary, then occasional users on low-bandwidth connections may already be finding Wiktionary unusable. DCDuring TALK 22:29, 17 March 2013 (UTC)
I believe it's only the fonts used by a particular page that are loaded when you visit that page. I'm hoping that WebFonts doesn't force you to download them again when they're already in your browser's cache from the last time you visited a page with those fonts, but I don't know for sure. I've turned off web fonts for now after having my browser crash when I visit pages that use Burmese fonts (I run Firefox 16.02 on a Mac with OSX 10.5.8, and I already have a Burmese font installed. Safari 5.06 has no problem on the same pages). Chuck Entz (talk) 23:08, 17 March 2013 (UTC)
(Re Yair.) Assume, temporarily and arguendo, that people interested in a certain language have a Unicode font for it. Is everything visible to every interested party, then, even without WebFonts?​—msh210 (talk) 06:58, 19 March 2013 (UTC)
I dislike it as a default because of page-load-time issues. I have no objection to having it as an option.​—msh210 (talk) 06:59, 19 March 2013 (UTC)

I am just glad we don't have a CJK font as part of WebFonts yet. That would totally break down the entire infrastructure. -- Liliana 16:02, 19 March 2013 (UTC)

I just turned the Webfonts option back on and reloaded this page. It loaded one font, Deva/Lohit-Devanagari.woff. When I check the url’s headers with curl -I, it returns Cache-Control: max-age=2592000, which I believe is telling your browser not to cache the resource for over 30 days. Of course, your browser could purge its cache much sooner on its own.

32 days would be better, because that might at least not force it to download twice in one billing period. Michael Z. 2013-03-19 16:14 z

Vote for bug 46327: Don’t purge cache twice in one billing period, for webfonts and other large resourcesMichael Z. 2013-03-19 16:25 z

Just one minor correction: 259200seconds is 3 days. One more zero is needed for 30 days. --biblbroksдискашн 20:37, 19 March 2013 (UTC)
Typo. The value is, indeed, 2592000 sec = 30 d. Michael Z. 2013-03-21 02:21 z

Appendix:Unicode/CJK Radicals SupplementEdit

I just wanted to get clarification regarding the Chinese/CJKV characters that are part of Appendix:Unicode/CJK Radicals Supplement. These are variations of CJK radicals that are not located in the other CJKV Unicode ranges. I attempted to make a redirect of one of these characters to a wiktionary article page of that radical's parent character several months ago but was told not to do that. The thing is, all of the characters in the CJK Radicals Supplement range that aren't red links are redirects (such as the one I attempted to make) dating back to 2010 or so. What does the community think the best course of action is? Treat these as individual entries in respect to being in a separate Unicode range than the other CJKV characters or to continue making redirects to the parent character entries (located in the main CJK Unified Ideographs ranges) that contain the main definition information? Bumm13 (talk) 14:29, 16 March 2013 (UTC)

Proposal on Meta that the WMF fund or take over WebCiteEdit

WebCite is currently in financial trouble; unless they can raise enough money to go on, they may have to stop accepting new pages or even delete pages they currently host. It's a good thing we don't consider them durable! Wikipedia, however, did rely on them, so there is currently discussion on Meta of the WMF funding WebCite, either by giving them a grant or by taking over the service. WebCite is receptive to the idea; some thoughts from w:User:Philippe (WMF) are here. You may wish to read and contribute to the discussion on Meta here. - -sche (discuss) 18:18, 16 March 2013 (UTC)

If it is hosted by WMF, will that make it durable for our purposes? On one hand, if WebCite goes down because Wikimedia is having trouble, that probably means Wiktionary has other problems to worry about itself. But on the other hand, Wiktionary can be mirrored so it(s content) could live on after Wiktionary itself goes down. —CodeCat 21:54, 16 March 2013 (UTC)
If WebCite goes down, it will not be because "Wikimedia is having trouble", but instead because Wikimedia editors can't be trusted to come to consensus. I hope that WMF hosts it, or at least donates. —Μετάknowledgediscuss/deeds 22:02, 16 March 2013 (UTC)
I don't think that's what CodeCat means. I think she's suggesting that if WMF does start to host WebCite, then maybe we might as well start considering it durable, because if it were to go down due to WMF having trouble (at some hypothetical future date), then Wiktionary itself would be in doubt (except perhaps for mirrors). —RuakhTALK 08:00, 17 March 2013 (UTC)
Ah, I see. Sorry, English is a really odd language. She was evidently using the present tense (well, present progressive and then simple present in a logically connected clause set) to refer to the hypothetical (i.e. long-term) future whereas I thought it was referring to the known (i.e. short-term) future. Why oh why did we have to give up on the subjunctive! —Μετάknowledgediscuss/deeds 16:15, 17 March 2013 (UTC)


I have added a lot of information to this page, particularly concerning common practice on Wiktionary which isn't documented anywhere else. I hope it is useful, and I also hope that it is enough to get the idea across. I've tried to go for a more normative/prescriptive approach, so that it helps new users decide more easily which practice to follow and which not, because there is so much old/historical code around it may confuse people otherwise.

Wiktionary:Headword-line templates and Wiktionary:Inflection-table templates should probably be deleted. They've already been nominated for deletion, but I'm not sure if anyone wanted to keep their contents. —CodeCat 21:52, 16 March 2013 (UTC)

Many, if not most, of the headword-line templates do not follow your advice. Perhaps it would be advisable to go through and standardise them, if you know how to find offenders. —Μετάknowledgediscuss/deeds 22:04, 16 March 2013 (UTC)
I know, and I don't know if we'll ever be able to fix all of them. The lack of headword-line templates in many entries is a much bigger problem in my opinion, though. —CodeCat 23:20, 16 March 2013 (UTC)
Agreed. Unfortunately for me, that's basically a bot/AWB job. —Μετάknowledgediscuss/deeds 00:25, 17 March 2013 (UTC)
I could try to do it with Python, but the problem is finding them. There are probably tens of thousands of them, maybe even a hundred thousand. —CodeCat 00:57, 17 March 2013 (UTC)
ELE states that the "inflection word" should be "(using the correct Part of Speech template or the word in bold letters)". Are you saying that is no longer allowable? SemperBlotto (talk) 07:51, 17 March 2013 (UTC)
I don't think so, and as far as I know many others agree too. Several editors are trying to be more consistent in the way we mark text in a given language, by applying the lang= attribute wherever it is appropriate. I think Michael Zajac in particular is championing that approach and I agree with him. On the other hand, others like DCDuring don't seem to care much about those details and favour a more "old school" approach to HTML (one based on the premise that the underlying code serves only to produce the correct result visually; an approach which is strictly deprecated by the HTML standard). I've written that page partly to reflect how I think we should do things rather than how we have done things in the past. If there is disagreement (Ruakh seems to disagree because he changed some things) then I think this is a good time to come to a consensus, because this will only get worse as time goes on. —CodeCat 14:23, 17 March 2013 (UTC)
I agree that we ought to scrap bold letter headword-lines. Our templates, even just {{head}}, have useful functions and can now be endowed with even greater powers than before. The HTML is important as well, although that's less central to me. —Μετάknowledgediscuss/deeds 16:57, 17 March 2013 (UTC)
Re: "Ruakh seems to disagree because he changed some things": Mostly I disagree with the notion of having a page that sounds like it's describing consensual current practice but actually describes one editor's personal preferences. For the record, I agree that we need lang="" (except maybe for English headwords), though not with the other aspects of that proposal. —RuakhTALK 17:26, 17 March 2013 (UTC)
It's not just one editor's preferences though. I know better than to push my own POV that hard... —CodeCat 17:27, 17 March 2013 (UTC)
Above, you wrote "I've written that page partly to reflect how I think we should do things rather than how we have done things in the past", and I believe that to be a correct statement. I'm not suggesting that all of your preferences are unique to you, merely that the page is accurate only when read as a description of your preferences. —RuakhTALK 17:47, 17 March 2013 (UTC)
Ok, that is true. But the alternative, writing purely descriptively, didn't seem very helpful at all. It would be a guide that doesn't guide... —CodeCat 17:56, 17 March 2013 (UTC)

Agent noun in definitionsEdit

Let us remove "Agent noun of accuse" from the definition of "accuser" and proceed similarly with other agent nouns. The phrase proposed for removal is not a part of a gloss definition, as it speaks of the noun rather than of the things to which the noun refers. Moreover, the phrase is almost always redundant: a noun defined as "one who accuses" is thereby an agent noun. Definitions in which the phrase is not redundant can be rephrased to make it redundant. Nonetheless, agent nouns can be placed to Category:English agent nouns. --Dan Polansky (talk) 12:02, 17 March 2013 (UTC)

I agree. — Ungoliant (Falai) 14:26, 17 March 2013 (UTC)
Yes, I don't like these, either. Equinox 14:28, 17 March 2013 (UTC)
  1.   Agree DCDuring TALK 15:54, 17 March 2013 (UTC) Excellent reform.
I've never seen one of these, but yes go ahead. Mglovesfun (talk) 16:13, 17 March 2013 (UTC)
Sounds good. Ƿidsiþ 16:21, 17 March 2013 (UTC)
For reference, as accuser entry no longer has "Agent noun of accuse", you can find the phrase as part of definition in this revision. --Dan Polansky (talk) 16:48, 17 March 2013 (UTC)
  •   Disgree. Agent nouns have distinctive syntax and semantics, and I think it's helpful to identify them explicitly. For example, "a lover of men" means "one who {loves men}", not really "{one who loves} of men". "His murderer" means "the one who murdered him", not "the one who murders him". "The grass-puller" can mean "the one who was pulling grass".[3] But you're right that it's a non-gloss statement, and should be in {{non-gloss definition}}. —RuakhTALK 17:34, 17 March 2013 (UTC)
  • That's true sometimes, and not at other times, e.g. "the market-gardener" isn't someone who gardens markets. Is there a way to identify this "distinctive syntax and semantics"? Equinox 22:48, 18 March 2013 (UTC)
  • But "the vegetable gardener" is someone who gardens vegetables, and "the tomato's gardener" is (potentially) the person who gardened the tomato. Agent nouns are nouns, of course, with all the usual properties of nouns — but they also have their own distinctive syntax and semantics. —RuakhTALK 23:00, 18 March 2013 (UTC)
Some of these come from {{new en agent noun}}. Mglovesfun (talk) 10:11, 18 March 2013 (UTC)
I created that template, and many of the entries employing it, following this Tea Room discussion. My impression from that discussion (and from the content of the agent noun entry) was that every noun formed by describing the "foo"-er (or "foo"-or) of verb "foo", whether that be a thinker, tinkerer, or editor, was an agent noun. I apologize if my impression was mistaken. There is also some discussion of agent nouns here. I do think that to the extent terms are properly classified as agent nouns, they should be identified as such. bd2412 T 01:26, 19 March 2013 (UTC)
Is judge#Noun an agent noun of judge#Verb? Is doctor#Noun an agent noun of doctor#Verb? Is typist#Noun and agent noun of type#Verb. IOW, does there need to be a specific diachronic sequence to the formation of the noun? Alternatively, does there have to be a specific morphological relationship? DCDuring TALK 02:15, 19 March 2013 (UTC)
I have no idea. The only ones that had occurred to me were the type that I asked about in that previous discussion. bd2412 T 03:52, 19 March 2013 (UTC)

Banning Wonderfool from requesting automatic inflectionEdit

Wonderfool doesn’t care about correctness when requesting automatic inflection. His User:Pofficerbot and User:Dawnraybot created a huge mess of wrong inflected forms ([4][5]), some of which were found out just recently ([6], amanheêsseis, amanheêssemos, amanheêreis, amanheêramos). Yesterday he requested that User:BuchmeierBot create the forms of salpimientar ([7]), but the conjugation table was incorrect ([8]). Therefore I think every sockpuppet he ever creates should be banned from requesting automatic inflection. If he needs it, he can ask someone else to check the table and request inflection. — Ungoliant (Falai) 13:09, 18 March 2013 (UTC)

I suppose it wouldn't hurt, but it probably would be about as effective as the ban on running an unauthorized bot was yesterday. Chuck Entz (talk) 13:39, 18 March 2013 (UTC)
Hold on, how did you conclude that User:Razorflame = Wonderfool? -- Liliana 13:48, 18 March 2013 (UTC)
Dawnraybot wasn’t WF?? Oops. Still, we should prevent WF from requesting wrong forms. — Ungoliant (Falai) 13:59, 18 March 2013 (UTC)
Dawnraybot definitely was WF, as was Pofficerbot. I think Liliana was reacting to the presence of Razorflamebot here Chuck Entz (talk) 14:05, 18 March 2013 (UTC)
Actually, I think I confused it with User:Darkicebot, which was RF. The names sound too similar. -- Liliana 14:33, 18 March 2013 (UTC)
I don't really feel like enforcing this myself, because I don't really care much for the whole Wonderfool hunt. So I won't be checking all of User:MewBot/feedme, but if someone else wants to check it they're free to. —CodeCat 17:10, 18 March 2013 (UTC)
  • The example you gave about salpimentar was a great example of how useful WF actually is. Please bear in mind that the base form of the verb was correct, even though the conjugation table was wrong. However, also bear in mind that there is an entry for salpimientar. The page [[salpimientar]] was created by one bureaucrat, then editted by another a few years later, and then the bot forms were created by an administrator. A point worth making is that salpimientar is a misspelling of salpimentar, and that nobody realized the error until now. --Fullupfrompizza (talk) 00:29, 19 March 2013 (UTC)
I’m not disputing that you are useful. Your work on Asturian has been good. However, you don’t check whether the conjugation is correct, creating a mess that can take half a decade for someone to find when it’s not. — Ungoliant (Falai) 00:38, 19 March 2013 (UTC)
Nobody's perfect, dude. All bot users have surely created hundreds of erroneous entries throughout their time. But they will all get found eventually. That's the beauty of wikis. See you on Wednesday. --Fullupfrompizza (talk) 00:42, 19 March 2013 (UTC)

Category:en:Geological periodsEdit

As I've filled fr:Catégorie:Périodes géologiques en français, I'm now able to create the around 150 pages of Category:en:Geological periods, and 150 for their corresponding adjectives, from User:Christian COGNEAUX/English geologic names.

My script is ready, do you think that I could launch my bot without the flag for this small unique mission please? JackPotte (talk) 15:46, 19 March 2013 (UTC)

  • Go for it. We can always chop it if needs be. SemperBlotto (talk) 15:56, 19 March 2013 (UTC)

In conclusion I've created the category and all the corresponding adjectives in lower case. However, two problems remain:

  1. Equinox showed me that there was no attestation of gzhelian, but only Gzhelian as an adjective. Reverso gave me an example in lower case, but after double-checking it's the minority of the cases (should we delete them?).
  2. Some entries were already created as proper nouns: Permian, Pridoli, Turonian and Maastrichtian, but I don't find any dictionary telling that. JackPotte (talk) 19:00, 19 March 2013 (UTC)
  • If you can't attest a word, post it on WT:RFV. Entries like Maastrichtian should be fixed; be bold. But your bot-added terms have some formatting errors, and I'd appreciate if you could fix those by bot. See the changes I made here for example. Minor to be sure, but local template usage is pretty important in unifying formatting. Thanks! —Μετάknowledgediscuss/deeds 22:31, 19 March 2013 (UTC)
Thank you, I've done the corrections and suggestions. JackPotte (talk) 23:32, 19 March 2013 (UTC)
I've temporarily autopatrolled the bot; I'll be removing that now. Please post a few minutes before you plan to run it so that an admin can do this. I don't think I've ever seen the lowercase forms in English (and I'm well-versed in matters geological), and checking each one individually would be extremely tedious. Do you have a solution to that? —Μετάknowledgediscuss/deeds 23:41, 19 March 2013 (UTC)
Any administrator can launch, from a file which content would be (in ASCII) all the pages I've created in lower case (not those which existed before): I can publish the list into Wiktionary:Requests for deletion/Others. JackPotte (talk) 08:07, 20 March 2013 (UTC)

JackPotte (talk) 19:29, 20 March 2013 (UTC)

Probably WT:RFV would be better, to at least give them a chance... —Μετάknowledgediscuss/deeds 01:35, 21 March 2013 (UTC)

meta:Wiktionary futureEdit

Image 1: Redundencies[sic].

It has been reported on Wikidata talk:Wiktionary#Project_page_on_meta (though not, oddly, here on Wiktionary) that "a page was created on meta to coordinate propositions concerning wiktionary future"[sic]. Among other things, it is proposed that "all WIKIs containing elements of one and only one foreign language (relativ[sic] to the general language of the Wiktionary) should be eliminated [] this means 1: Haus in the English and French Wiktionary should be eliminated; Maison in the German and English Wiktionary should be eliminated; house in the German and French Wiktionary should be eliminated and substited[sic] by another ENTITY as described later". The page is in dire need of both competent input from actual Wiktionary editors and orthographical and grammatical cleanup; please help out! - -sche (discuss) 21:14, 19 March 2013 (UTC)

Some of what this fellow says makes sense (better semantic markup, better identification of important data); some of it is pure madness (removing any entry pages that are only for one language -- what happens when editors want to add another language later, such as Luxembourgish at [[Haus]]? -- what about target-language (in our case, English) descriptive and explanatory text? etc. etc.).
The page reads a bit like buzzword bingo: "The current mostly text- and mark-up-based structures with more or less accidental quality of data-content should be moved, step by step, into a better apt, more easily understandable and usable data model, representing the real long term needs of the subject. The view is that of a cross-Wiktionary user." Huh? And how many of these "cross-Wiktionary users" exist?
The proposal seems to be completely oblivious to the strikingly different ideas about linguistics and grammar that are evidenced by the various user communities. Entry structure over at the RU WT or the ZH WT is quite different from how we do things here, for instance.
And that's just for starters. The underlying idea (better data portability and sharing) is a good one; this half-dreamt extrapolation of that idea is frighteningly like a bulldozer revving in front of my house and aimed at the living room. -- Eiríkr Útlendi │ Tala við mig 21:44, 19 March 2013 (UTC)
Theres omega/Z already, what does this do that that doesnt. —This comment was unsigned.
I think the difference is that this would require every Wiktionary to change dramatically and converge toward a single entity (considering the fact that the communities can't even agree on a freaking common logo, I don't think this is realistic unless the proposed solution is rock solid). While Omega seems to be separate and complementary to Wiktionary. Dakdada (talk) 09:37, 20 March 2013 (UTC)

meta:Requests for comment/Adopt OmegaWikiEdit

One more Meta proposal which would affect Wiktionary: the proposal, by GerardM, Kip et al, that the WMF adopt OmegaWiki. OmegaWiki, formerly known as WiktionaryZ, is a WikiData-esque "project to produce a free, multilingual dictionary in every language" using a relational database "not based on 'words', but on the concept of Defined Meaning". I've just left my thoughts; consider leaving yours! - -sche (discuss) 08:54, 20 March 2013 (UTC)

Wiktionary and WikidataEdit

The above propositions about the Far Future of Wiktionary and OmegaWiki are fine, but I'm more interested in the near future, i.e. how can we use Wikidata without changing everything everywhere. Also, I'd like to see more synergy between projects when working on things that could be used by everyone.

As you know, Phase I of Wikidata for Wiktionaries is already being worked on. The purpose of this phase is to move the interlanguage links to Wikidata (no more interwiki bots !). This should not be too difficult, and it will probably be mostly automated.

What I'm interested in would be the Phase II. There have been talks about moving pronunciations, declensions etc. to Wikidata, but I think it's unrealistic right now given the heterogeneity of Wiktionaries on these matters. I believe we should focus on common data that can be shared but are independent of the communities. Such data would be, for example :

  • Translitterations
  • Sort keys (or collation for categories)

Both are standardized and independent of the communities. Only the word and the script destination/language of the word is required. There already are works about Lua Modules, but each Wiktionary seems to be working on its own solution, which is a waste. We should either use Wikidata to store and reuse the translitterations/sort keys, or develop common libraries for Lua. The advantage of Wikidata would be that the values would not have to be computed every time.

I'm sure there are other data like that that could be shared realistically between Wiktionaries (either with Wikidata or common Lua libraries). Dakdada (talk) 10:03, 20 March 2013 (UTC)

Both of these are probably better handled using Lua than by storing them in some place. I think sharing Lua would be beneficial, but it would be hard because modules rely on infrastructure that may not be there on another wiki. —CodeCat 11:23, 20 March 2013 (UTC)
I'm hoping for some libraries directly included in the Lua extension, like mw:Extension:Scribunto/Lua_reference_manual#Language_library. There should be some translitteration and collation libraries that could be used, instead of writing everything all over again on every wiki.
As for sharing code, it may be good to design them so that they could be used elsewhere (be it on other Wiktionaries or other sister projects). Otherwise there will be a lot of wheels reinvented. Dakdada (talk) 11:34, 20 March 2013 (UTC)
Sorry, but your assumptions just aren't true. Transliterations aren't "standardized and independent of the communities"; in fact the opposite is true. Sort keys might be, but that's even easier to solve with local Lua modules. I personally haven't seen anything in Wiktionary Phase II discussions that I could support. It seems to be badly thought out by people who aren't at all familiar with the differences between the major Wiktionaries. —Μετάknowledgediscuss/deeds 01:42, 21 March 2013 (UTC)
Frankly, I think Metaknowledge hits the nail on the head: the whole idea of shifting content to WikiData—first the idea of moving pronunciations and translations, now the idea of moving transliterations—"seems to be badly thought out by people who aren't at all familiar with the differences between the major Wiktionaries." - -sche (discuss) 03:06, 21 March 2013 (UTC)
Putting transliterations and sort keys on a central database would be workable for Japanese and helpful, especially to other Wiktionaries. We do it well on here and I expect other sites would like to imitate us.
  1. As for sort keys: Japanese entries have a sorting issue that we (mostly I myself) deal with in a very menial and labor-intensive way. We sort Japanese entries on here the same way that Japanese dictionaries do, which is different from the way the servers do it automatically, so for every entry where it's different, we have to add another key (or several other keys) to sell the servers "sort this as if it were actually this." For example, changing everything to Latin transliteration, "gorira" (see ゴリラ) would normally sort under "go" but we want it under "ko" so we have to force it to be sorted by this: korira' The apostrophe forces it to follow "korira" if it exists. Repeat for every category link and context template. It's an ugly hack and very few editors understand it. I have over 43,000 edits and a lot of them were dealing with this issue. Other WTs probably have the same issue and would want to do the same thing. Sort keys for Japanese are the same for every WT as long as they sort in the Japanese style. A quick look at French (#2 in number of entries) shows that they do too. Italian and Mandarin do not.
  2. Transliterations: Transliterations of Japanese into the phonetic scripts of Japanese, that is, (usually) from kanji to hiragana, are relatively uncontroversial. Those into Roman letters (or as we say, written in romaji) are a source of great disagreement. However, and this is a bit chauvinistic, I think that the method that we use on English WT is the best one, and the most modern, widely accepted one worldwide, and other Wiktionaries would like to use ours. --Haplology (talk) 03:15, 21 March 2013 (UTC)
@Haplology: Unfortunately, you haven't refuted my statements, but instead backed them up. All you've said is that the way we do it at English Wiktionary yields the best quality of results (and in this case I'll agree with you), but that other Wiktionaries don't agree. Forcing them to would be as bad as them forcing us: I don't think anyone wants that. More to the point, it looks like both your issues (converting between the phonetic scripts of Japanese and sorting Japanese-style) are perfect for Lua. —Μετάknowledgediscuss/deeds 03:28, 21 March 2013 (UTC)
I've already implemented sorting keys for Dutch and Catalan, see Module:nl-common and Module:ca-common. —CodeCat 03:50, 21 March 2013 (UTC)
Here are the standards for transliteration : List of ISO romanizations. Collations are also standards, and it is more complicated than simply removing diacritics. Both are in the CLDR. If we could have this available in a common Lua library (not Modules, directly in the Extension), we would not have to create separate modules for every language in every Wiktionary, which is what is being done right now. Again, we are reinventing the wheel (and not in the best way).
There are some standards for transliteration. Yes, we should definitely use standardized romanizations, but we should choose appropriate ones. The w:ISO 9:1995 romanization for Cyrillic, for example, is based purely on character glyphs and disregards language differences, so it is probably excellent for multinational document cataloguing, but it is very poor for lexicography. You’ll notice that the CLDR includes some non-ISO systems,[9] and allows for others to be added.[10] Michael Z. 2013-03-21 16:05 z
Also : please don't condescendingly assume that I don't know anything about Wiktionary projects. I was suggesting to share some of the work, either with Wikidata or with Lua, precisely because I know that the projects don't share anything. Dakdada (talk) 10:18, 21 March 2013 (UTC)
I've argued several times before that using sort keys in a multilingual project is still very much a hack, and doesn't solve the real problem. The current ordering can't account for the fact that different languages might order letters differently (for example, Swedish orders ö at the end but Turkish puts it after o). The current system also doesn't allow for languages that might treat character sequences as distinct letters (Hungarian sz, cs, ny etc come to mind). A real solution that I've proposed before is to allow categories themselves to have custom collation orders. Something like a magic word {{COLLATION:nl-NL}} or something similar. —CodeCat 13:47, 21 March 2013 (UTC)
I totally agree. However this is a long sought feature that does not seem to be worked on (I think there is a bug for that lost in the limbs of Bugzilla). Maybe we should try to ask again (or at least make sure there is actually a bug for that).
If, for the time being, we could have something that creates correct sort-keys automatically (for a given language), it would be great. And if this sort-key library in Lua is consistently available in all projects: even better. Oh, and collations can be useful for other things than Categories. Dakdada (talk) 14:03, 21 March 2013 (UTC)
I've re-opened this bug. Please support (vote for the bug) and comment on it! —CodeCat 14:27, 21 March 2013 (UTC)
Thanks! I also left a message on the fr.wikt Beer parlour. Dakdada (talk) 15:33, 21 March 2013 (UTC)
  • For that matter, what about cases where a single spelling needs to be categorized under multiple different sortings? See meta:Help_talk:Category#Any_way_to_sort_under_multiple_sort_keys.3F for an explanation of what this means for Japanese, a language that is quite happy to apply several quite different readings to a single kanji spelling. -- Eiríkr Útlendi │ Tala við mig 16:39, 21 March 2013 (UTC)
    Collation per category is possible, but several keys for one word may require changes to the Mediawiki database itself (among other things), or a whole new dedicated extension, so I'm afraid we'll have to use workarounds for this for a while. How do they manage that on ja.wikt ? Dakdada (talk) 17:04, 21 March 2013 (UTC)
    • How do they manage that on ja.wikt? --> Mostly, they don't. I notice that ja:靖 is listed under せい (sei) in their index of Japanese name kanji, which is only one of the many readings for this character. I can only assume that they ran into this same technical limitation of the MediaWiki back-end and decided to adopt the most common on'yomi for indexing purposes. I also note that they don't have any given names at all (at least, after searching for a while I couldn't find any), and Japanese given names are some of the most inventive when it comes to readings for any given set of kanji. -- Eiríkr Útlendi │ Tala við mig 21:48, 21 March 2013 (UTC)

Template:comparative of and Template:superlative ofEdit

I think we should either split these in order to create {{en-comparative of}} and {{en-superlative of}} because of the more ''[[{{{1}}}]]'' (or most ''[[{{{1}}}]]'' ), or remove that bit all together. Alternatively suppress that bit automatically when lang=en or no language is given at all. So that's three options, or four if you include do nothing (leave it as it is)

  1. split to create en templates
  2. remove the more/most bit all together
  3. suppress the more/most bit when lang=en lang is not given
  4. leave it as it is

Mglovesfun (talk) 13:42, 20 March 2013 (UTC)

I don't necessarily agree with creating English-specific versions of all the form-of templates, but for this specific case I do agree and was thinking of proposing the same (option 1). Also, option 3 is the same as option 4 because it's how it works right now. —CodeCat 15:15, 20 March 2013 (UTC)
Also, most Slavic languages form a periphrastic comparative in much the same way as English, and those languages might want to insert their own word for "more" instead. I doubt we'd want to add support for all of those into these templates. —CodeCat 15:17, 20 March 2013 (UTC)

Use of Template:param in template documentationEdit

I just noticed today that Template:param is being used on several template documentation subpages to mark up parameter names, but the way it's being done makes no sense. See Template talk:param for details, and to discuss the issue further. - dcljr (talk) 07:30, 21 March 2013 (UTC)

Wiktionary:Votes/pl-2013-03/Japanese Romaji romanization - format and contentEdit

Category collationEdit

For those of you who did not read the #Wiktionary and Wikidata discussion, please take a look at the following bug, and add your vote (registration required):

This would allow every category to have its own collation (word order), adapted to the language of its content, instead of the default Unicode. This way we would not have to bother with setting sort keys in every article. Dakdada (talk) 12:43, 25 March 2013 (UTC)

Category collation for kanjiEdit

  •   I don't suppose you know if there's a bug report for the problem where a single headword can only ever be indexed under one listing per category?
This is related to collation, as single Japanese entries might need to be collated in multiple places. Take the single kanji , for instance -- all on its own, it can be read variously as kubo, kubomi, nakakubo, hekomi, or boko, all of which are nouns. However, due to the current implementation of categories in the MediaWiki software, even if all of these categories are included in the wikitext, the last one on the page seems to be the only one that works (I'm guessing that this is probably because the MW software looks at each cat in turn and overwrites the previous indexing value, instead of allowing multiples). Consequently, is only listed under the boko reading in Category:Japanese nouns, when a proper dictionary would list this under all five readings, not just one of them.
If there is such a bug report, please let me know. -- Eiríkr Útlendi │ Tala við mig 15:14, 25 March 2013 (UTC)
You can report such a bug yourself (on bugzilla), although it would be good to have an idea to propose. One workaround would be to create several redirects which would be categorized like the main article, but with different names corresponding to the reading, e.g. よう (凹). I suppose something like that was already suggested... Dakdada (talk) 16:42, 25 March 2013 (UTC)
That's what I did with Welsh cyngyd, which requires two different alphabetizations: one meaning treats "ng" as a single letter alphabetized between "g" and "h", and the other meaning treats "ng" as two separate letters. I made a redirect from cyn‌gyd (with a zero-width nonjoiner) for the single-letter variant and sorted it as "cygzyd". Kind of kludgey, but it works. —Angr 16:50, 25 March 2013 (UTC)
  •   Yes, I mentioned that in the thread over on Meta. Due to the very large number of possible alt readings for some Japanese kanji, this kind of hack quickly becomes untenable, and very hard to maintain if anything changes. cyngyd is easy enough with just two collations needed. What about , which would need at least 13 collations? Or , which would need somewhere around 30 to account for all the name readings? Even excluding name readings, this character would need eight or nine collations. Manually creating so many hack workaround blank pages just to handle category collations can't be the best way to solve this problem... -- Eiríkr Útlendi │ Tala við mig 06:27, 27 March 2013 (UTC)
    Even though it is a hack, this is nonetheless the way the entries should be displayed: よう (凹) or 凹 (よう). Because having just in a category doesn't help, especially if it's only there several times. Right now we can only use redirects for that. Maybe Wikidata could help for this. Dakdada (talk) 10:21, 27 March 2013 (UTC)
Why not actually create one of those as an entry and redirect it to ? —CodeCat 14:55, 27 March 2013 (UTC)
A soft redirection then? Well, there are already articles about transcriptions, so that would be similar. Dakdada (talk) 15:01, 27 March 2013 (UTC)
  • I think CodeCat intends for this to be a hard redirect, much as the cyngyd redirect that Angr mentions above.
One serious question though about Dakdada's suggestion is how do you intend for the entry to display as you've written? Are you proposing separate pages for each reading of any kanji combination? Notice that we already have kana pages, so should be listed under よう, おう, くぼ, くぼみ, etc. Are you suggesting that we should have pages listed under the combination of kanji + kana reading for those kanji?
The standard has been to list all readings of a given kanji term within the entry for that term. If you look at the JA entry for , you'll see that each reading has its own etym (as each reading historically has its own distinct derivation), with each etym sectin showing the reading. -- Eiríkr Útlendi │ Tala við mig 15:28, 27 March 2013 (UTC)
Either we only use kana pages to link to the corresponding kanjis, or we don't and we have to create separate pages for each reading, in the form [kanji (kana)], redirecting to the kanji. Dakdada (talk) 16:12, 27 March 2013 (UTC)
  • Rather that we've already been creating kana pages to link to the corresponding kanji entries for several years now, I suggest we just continue with our current m.o. However, this has no bearing on any solution to the collation problem. I'll see about filing a bug report at some point. -- Eiríkr Útlendi │ Tala við mig 17:16, 27 March 2013 (UTC)
Perhaps I'm missing something, but isn't the original poster's concern properly addressed by using "Index:" pages? - dcljr (talk) 08:42, 19 April 2013 (UTC)

"-des" pluralizationsEdit

While adding plural categories to nouns with "-es" endings, I came across a few dozen words that are pluralized by adding or changing to a "-des", such as ephelides, which is the plural of ephelis, and lagopodes, which is the plural of lagopus. Are these properly considered "-es" endings, or should they be categorized separately? Also, if they should have their own category, are they irregular plurals? bd2412 T 02:54, 27 March 2013 (UTC)

They're Ancient Greek borrowings that have kept the Greek plural forms. S at the and of an Ancient Greek word tends to absorb most consonants that come into contact with it, but the vowels in the plural endings keep them separate, so you can see the real ending of the stem. These are all words with a hidden -d in the singular that shows itself in the plural. Needless to say, this is all by Ancient Greek rules: the words were either borrowed as a unit with both singular and plural taken directly from the Ancient Greek, or they had the plural endings added back by people trying to imitate the Ancient Greek. I don't think there's any process in English that produces them- it's the fact that they're left unaltered that sets them apart. They're definitely a group, but only because they're all Ancient Greek third declension -d stems that haven't been assimilated to the English ways of forming plurals. Chuck Entz (talk) 05:36, 27 March 2013 (UTC)
[after e/c] If we go by the actual text of Category:English plurals ending in "-es", it's a hodge-podge. It's apparently supposed to include two kinds of plurals:
  • plurals whose spellings are formed by adding <-es> to the singular spellings. (This is already a bit arbitrary, actually, since this is not the same as the set of plurals formed by adding /-əz/ to the singular; note that "heroes" and "tomatoes" use the /-z/ plural, whereas "ridges" and "caches" use the /-əz/ plural even though their singulars are already spelled with <-e>. But it may be useful anyway.)
  • Greek-derived plurals of the -is-es type, as in "analyses" and "diagnoses" and so on. (This is almost completely separate from the first kind. The two seem to have a tiny bit of overlap at the edges, in that you'll sometimes hear people use the /iz/ pronunciation in words like "processes" where the etymology does not support it, but really, they're still almost completely separate. Besides, people also use that pronunciation sometimes for "Reese's pieces", which does not satisfy the criterion given in the category text.)
In addition, the category currently contains various assorted plurals that don't satisfy its description, such as "phalanges" (plural of "phalanx", after Greek) and various plurals in "-ices" of singulars in "-ix" (after Latin).
Personally, I would support splitting this up into more logical groupings. Even just separating the orthographic-addition-of-<-es> group from the Greek-and-Latin-irregularities would be a big improvement.
RuakhTALK 05:41, 27 March 2013 (UTC)
Strongly agree, either that or just allow any plural ending in -es such as pieces, races and so on. One way or the other, but not this. Mglovesfun (talk) 16:23, 27 March 2013 (UTC)
I will see to it this weekend. Cheers! bd2412 T 02:01, 28 March 2013 (UTC)
I have created and populated Category:English irregular plurals ending in "-ces", Category:English irregular plurals ending in "-des" and Category:English irregular plurals ending in "-ges". The question remains whether plurals formed merely by changing a final "-is" to a final "-es" should be categorized separately from other "-es" plurals (and if so, what should the category be named), and whether plurals formed merely by the addition of an "-es" to a singular ending in "-o" should be categorized separately. My opinion is that the difference in pronunciation for words like heroes and mangoes does not make the ending "-es" any different, as that merely follows from the "-o" itself. bd2412 T 03:29, 28 March 2013 (UTC)
I propose to create Category:English irregular plurals ending in "-es", and populate it with plurals for which an "-is" ending becomes an "-es", as with analyses. bd2412 T 17:03, 30 March 2013 (UTC)
Done. Cheers! bd2412 T 01:33, 31 March 2013 (UTC)
Another example I saw today: mamey sometimes keeps its Spanish plural mameyes in English. Equinox 16:25, 27 March 2013 (UTC)

Increasing default font-sizeEdit

The Vector skin’s default body font-size is 13px. This is 80% of the HTML and web-browser default of 16px, equivalent to CSS font-size small or 0.8em, or HTML size=2.

Wiktionary’s style sheet (Common.css) has 54 declarations increasing font-size for language scripts, plus one for IPA. Their average value is 123% and median is 125% (e.g., 1.25em is equivalent to 125%), both equivalent to 16px (= medium, 1.0em, or size=3).

If we simply set the website’s default font-size to the normal HTML default of 16px, then we can remove 44 of these exceptional font declarations, and reduce the contrast of the remaining 11. Advantages to readers and editors would include:

  • Improved readability on small and large screens
  • Better consistency in different scripts and languages
  • Better consistency in font rendering (e.g., stroke-width discrepancy between Latn and Arab text in Arial)
  • Consistency in IPA (e.g., /abcde/ vs. abcde on the same line)
  • Fewer exceptions to futz with in our CSS and templates
  • Paving the way to more modern CSS

I have been browsing with the medium font-size for some days (see User:Mzajac/vector.css), and I find it to be an improvement on both the desktop and the mobile. Michael Z. 2013-03-28 17:51 z

So I am proposing making the font bigger. No objections or comments at all? Michael Z. 2013-04-02 15:32 z

  • No objection (it might even let me move from monobook). SemperBlotto (talk) 15:39, 2 April 2013 (UTC)
    Interesting. For what other reasons have you stuck with Monobook? Michael Z. 2013-04-02 16:02 z

POS labels and different languagesEdit

I'm puzzled by the POS labels I see for Lojban. Take rafsi, for instance. This shows up as the POS label in terms such as rin. But what the heck is a rafsi? Apparently, it's the Lojban word for an affix. So why not use the POS label Affix?

Does this imply that we are allowed to use the POS labels of the source language? This would obviate some of the difficulties we JA editors have had in finding a fitting English label for the Japanese POS known as 形容動詞 (keiyō dōshi, literally adjectival verb) (except they aren't at all verbs, and some of them are more like nouns). Functionally, these are basically a class of adjectives, which includes a few specific terms that can also be used as nouns.

However, using the grammar labels of the source language as POS headers introduces new difficulties, as our target audience consists of English-language readers, and English-language readers can't be expected to know what 形容動詞 means, nor what keiyō dōshi means. (Heck, I've got growing reservations about our current header of Adjectival noun for this POS, thinking more that we should just use standard EN grammar labels where possible; and "adjective" would work for this POS... but anyway.)

Similarly, English-language readers can't be expected to know what a rafsi is.

Can anyone explain what the deal is with Lojban? Is it just a weird enough language that Lojban entries are given a pass with regard to WT:ELE? Is it just that no current editors care enough to fix these? Or is this carte blanche to get creative with POS headers, and to heck with WT:ELE? -- Eiríkr Útlendi │ Tala við mig 18:50, 28 March 2013 (UTC)

The problem with Lojban in particular is that it is different. Lojban really doesn't have nouns or verbs. What Lojban calls a gismu corresponds to a noun, an adjective, adverb or a verb. This is because such words are technically predicates: they don't represent an object, but rather a certain truth about an object. To take the top two entries in Category:Lojban gismu... bacru is what we might call a verb, because it expresses that the subject performs an action. But badna is more like a noun because it says what something is. However, this distinction isn't at all meaningful in Lojban itself; "badna" is equally a verb and then means "is a banana", and the two are completely interchangeable (insofar as they take the same number of objects).
For other languages, the problem is similarly that we can't ever hope to adapt the terminology that is appropriate for that language, to English. Some parts of speech are not familiar to English speakers because they don't exist in English. That's something we will just have to cope with. To limit ourselves to the words used to describe English also means that we try to artificially force other languages to fit an English-shaped mold. Imagine if we tried to reverse it, like in a language where every adjective were a verb (this does exist, to greater or lesser degrees, in many languages). Would it be appropriate to give green a "Verb" PoS header, or would we instead use some word that means "Adjective" more exactly, but isn't familiar to many speakers? —CodeCat 19:56, 28 March 2013 (UTC)
WT:About Lojban includes helpful advice like “All text in Wiktionary should be in English” and “For a gismu, list here the lujvo and type-3 fu'ivla derived from it.” Ha!
Are these words attested English terms? Michael Z. 2013-03-28 20:00 z
In the context of discussions about Lojban, I assume most definitely yes. Outside that, no, because they are only meaningful within that context (just like quasar only means something in astronomy). But you can of course RFV them. The only problem is, if they fail, how do we describe Lojban if we have no words to describe it with? —CodeCat 20:05, 28 March 2013 (UTC)
re "how do we describe Lojban if" those terms fail RFV: the go-to header for anything that isn't some other POS is ===Particle===, although I admit describing everything in Lojban as a particle would be ... unhelpful. - -sche (discuss) 20:19, 28 March 2013 (UTC)
  • Some words in Japanese do double duty as nouns and verbs, such as 混雑 (konzatsu, a crush, congestion; to be crowded, to be jammed in or together). In these cases, we have been listing these under both ===Noun=== and ===Verb=== POS headers.
Other words in Japanese function as adjectives, but can also be used predicatively without a verb, such as 良い (yoi, good). In strict functional grammatical analyses, these have been variously described as adjectives, stative verbs, and adjectival verbs, among other things. However, such strict functional grammatical analysis belongs in an encyclopedia article, so for purposes of POS header in EN WT entries, we describe these as adjectives.
If badna in isolation equates to EN noun banana, and as a predicate it equates to the EN verbal phrase is a banana, then it would be much more useful for English-language readers to label this as a ===Noun=== and include links to relevant articles on Lojban grammar that explain how things that function as nouns can also be used in other ways.
Māori could be analyzed as functioning somewhat similarly. He wai tēnā works out literally to A (or some) water that. There is no real verb that means "to be"; you just use the noun.
But what's going on here is relevant to the syntax of the language, and how different words are used in relation to each other. Wai is still a noun, even when used predicatively -- the word is a label for a person, place, or thing, ergo it is a noun (hearkening back a bit to Schoolhouse Rock grammar lessons). Similarly, I would argue that badna is a "noun" for purposes of discussion in English. Calling this a "gismu" with no other POS or grammatical information (and not even any links to the term gismu) is just obtuse and unhelpful when your target audience is English-language learners, who cannot be assumed to have any foreknowledge of Lojban.
For that matter, the definitions given in the gismu entry are not exactly helpful either -- apparently readers need to know Lojban and obscure notation before they can make any sense of purportedly English-language definitions of Lojban terms. Not very user-friendly.
If the EN WT is intended to be a dictionary of many languages into English, then we must use English to describe the source-language terms. Using the source language to describe the source language fails at this. -- Eiríkr Útlendi │ Tala við mig 21:12, 28 March 2013 (UTC)
Why should Wiktionary make up its own terminology instead of using the words that are normal in a given field? Within Lojban discussions, gismu and so on are the standard words. If we don't use them, then yes, we might no longer confuse the occasional person who isn't familiar with those terms. But we would now be confusing the vast majority of Lojbanists who now no longer can make sense of our definitions. And as I tried to argue (but this point seems to have missed you), in Lojban, nouns and verbs are the same thing. There is no distinction whatsoever between "words for things" and "words for actions", both are the same, indistinguishable and interchangeable. Distinguishing them is artificial, and would be an attempt at best to fit them into an English-shaped mold. If we decide to distinguish them, how would we make the distinction? There is nothing within Lojban itself that can give any clue as to what is a verb and what is a noun, so editors will be faced, with every entry, with the completely arbitrary decision of whether a word is a verb, noun, adjective and so on. Because those things don't exist in Lojban. —CodeCat 21:25, 28 March 2013 (UTC)
Our readers mainly do read English and do not read Lojban. This project is a general grammar discussion and not a Lojban one. w:Lojban grammar glosses gismu as “root word,” so I don’t think it would be unreasonable to use that as a POS heading, and explain the details in WT:About Lojban. Anyway, I won’t be convinced to support a “Gismu” header until I can understand our English definition of gismuMichael Z. 2013-03-28 21:38 z
(after edit conflict)
  • @CodeCat, it sounds like you're arguing that the word "noun" is made-up terminology. Just comparing how other WT sites handle Lojban, I find first that few others include Lojban terms, and second that the entry for badna on the Lithuanian WT at [[lt:badna]] uses the POS header daiktavardis, i.e. "noun", while the entry for casnu on the Malagasy WT at [[mg:casnu]] uses the POS header matoanteny, which further googling reveals to be the Malagasy word for "verb".
Again, the EN WT is for English-language readers. We should be using English in the descriptions. This is not to say that we cannot also use other languages in the descriptions, but at the bare minimum, we must write entries that an English-language reader can understand. This should be our ideal. Many of the Lojban entries I've looked at fail to achieve this ideal.
Japanese makes distinctions that English does not, so in the EN WT entries for such terms, we (the collection of JA editors here over time) have worked hard to come up with appropriate English-language labels.
Lojban terms like badna or grute or tsiju all look very noun-ish. These terms all seem to be labels for persons, places, or things.
Lojban terms like casnu or cusku or tavla all look very verb-ish. These terms all seem to be labels for actions.
I note too that Category:jbo:Verbs exists, as does Category:jbo:Nouns. These categories have at the top a sentence reading: This category is for Lojban words which would tend to be considered [ verbs | nouns ] from an English speaker's perspective.
Although Lojban grammar and grammarians might not distinguish between nouns and verbs, the fact remains that some of these words describe persons, places, or things (i.e. "nouns"), and some of these words describe actions (i.e. "verbs"). From an English-language reader's perspective, these words function like nouns and verbs -- it would be better to label them as such. Anyone who has gone through any education regarding English grammar, which we can at least begin to assume for the English-reading target population of the EN WT, will be at least passingly familiar with terms like noun and verb. We cannot assume the same familiarity with terms like gismu, and it is for this same reason that we are not using the term ​keiyō dōshi as a POS heading. -- Eiríkr Útlendi │ Tala við mig 22:06, 28 March 2013 (UTC)
I think you have a different (and possibly incorrect) idea about the purpose and meaning of the PoS header. As far as I know, it's not meant as a definition or even part as one. That is what the definition is for. I have tried to understand a bit more about Lojban to see how to explain this, and (as a side note) I noticed that not one grammar actually uses English terms to describe it. They all use bridi, gismu, selbri and so on. From what I understand, as far as Lojban content words go, they all have the same structure, which is one that would be translated as a verb. Even nouns and adjectives. badna really means "(subject) is a banana", and although we might call it a noun, it can take a subject (an argument) as if it were a verb. So in more familiar terms you might call it a stative verb, but stativeness is also a concept that is unknown to Lojban, all it has is predicates (which act like mathematical functions) taking one or more arguments. It's possible to leave the arguments out, with the idea that it's unimportant or obvious. I think badna used in a sentence by itself means "that which is a banana". So if I'm not mistaken, a predicate, when it is itself used as an argument of another predicate, becomes a relative clause. —CodeCat 23:32, 28 March 2013 (UTC)
  • @CodeCat, you note that: "I think you have a different (and possibly incorrect) idea about the purpose and meaning of the PoS header." It's entirely possible that I do. :) If so, then the most apparent (to me, anyway) alternate interpretation for what POS headers are for is for labeling the part of speech in terms specific to the language being described, rather than the language being used for the description. This would seem to mean that we should use ​keiyō dōshi as a header for that class of words in Japanese, given that this class does not exist in English and that it doesn't map entirely to the English POS adjective. I'm certainly open to that argument, but is that the correct extension of what you're saying about Lojban headers?
Any other editors out there with views on this subject? -- Eiríkr Útlendi │ Tala við mig 00:52, 29 March 2013 (UTC)
What I think we should do is use the descriptions that are the most common when discussing that language's grammar in English. For Lojban, that definitely means using the Lojban words because those are the terms that are the most familiar for such descriptions. There is also the case of Zulu and other Bantu languages, which have two classes of adjective-like word: one is a closed class that is usually called "adjective" and has more noun-like properties, while the other is an open class and is inflected like a verb/relative clause and is called a "relative" in most grammars. I don't know what term is used for Japanese, but if "keiyō dōshi" is the most common term used to describe those words even in English, then we should probably use that here too. —CodeCat 02:25, 29 March 2013 (UTC)
@Eirikr: IMO, Lojban is "just a weird enough language that Lojban entries are given a pass with regard to WT:ELE".
For natural languages, I prefer to use terminology that is recognisable to English-speaking linguists. (Note that this is not the same as insisting that languages have only parts of speech that English has—English does not use circumpositions, but circumposition is a recognisable part of speech.) For Japanese, that means I oppose using "keiyō dōshi" as a header, but would be amenable to "adjectival verb", "adjectival noun" or "nounal adjective".
Personally, I also prefer not to use specialised parts of speech when general ones are sufficient, hence Category:Abenaki nouns exists even though almost all of its entries could also, in very unhelpful analyses, be called nounal verbs or verb forms (segôgw (skunk, literally (third-person singular) urinates)), stative verbs/nouns (sips (bird, literally (is a) bird)), etc... if "adjective" would work for keiyō dōshi, I do think it ("adjective") should be considered. (Btw, some Abenaki nouns, such as kpiwi (woods), can be and might helpfully be listed as also adverbs.)
Artificial languages are a different matter, because they can be (and Lojban was) designed to have unnatural structures. I have long thought our coverage of Lojban was a mess, because—as Eirikr notes—"readers need to know Lojban and obscure notation before they can make any sense of purportedly English-language definitions of Lojban terms", but I am content to let it remain a mess because I don't imagine anyone but Lojbanists making use of it. - -sche (discuss) 02:26, 29 March 2013 (UTC)
Well, exactly. It's true you have to be a Lojbanist to understand the POS headers of our Lojban entries (Lord knows I don't understand them), but then you also have to be a Lojbanist to want to use our Lojban entries for anything. Although I too scratch my head in bewilderment at the POS headings of our Lojban entries, I accept that they're meaningful for the people interested in Lojban and that changing them to more familiar terms would be misleading at best and flat-out wrong at worst. —Angr 09:31, 29 March 2013 (UTC)
  • I'm not entirely against switching these to English names, so long as they were at least as useful as the ones we have. But the goal here is not to pound a square peg into a round hole, and take words that are the same type of speech and label them differently just because they'd translate into English in different parts of speech. These aren't nouns, verbs, etc.--Prosfilaes (talk) 21:14, 30 March 2013 (UTC)
As Eirikr pointed out as an aside, Māori doesn't really fit into our current L3 system very well, and a lot of languages I like don't. One Tongan dictionary I have simply categorises most words as substantives or verbs, and that seems to work pretty well. We really ought to step down from shoving other languages into L3s that are comfortable for us. If Lojbanists agree on a coherent way to present a very un-English language, then we need to respect that. —Μετάknowledgediscuss/deeds 18:14, 31 March 2013 (UTC)
  •   Re: Lojban and POS labels, folks above note that these are intelligible to Lojbanists, full stop.
I thought the EN WT was intended as a many-languages-to-English dictionary? If no one here really understands Lojban entries, that seems to be a very firm indictment that our Lojban > English entries fail pretty hard.
Even if we are to use Lojban-ish POS headers, some of these Lojban labels overlap sufficiently closely to the English labels that it beggars my understanding why we don't use the English. Leaving aside the issue of what the heck a gismu is, a rafsi seems pretty clearly to be an affix. So why don't we use the transparent label ===Affix=== for the POS header, instead of using the label ===Rafsi=== that no English speaker knows?
  •   Underlying this query of mine about Lojban and POS headers is the very real and very deep concern about what we're doing here -- what is the point of the English Wiktionary? If the point is to serve as a many-to-English dictionary, then should we not be writing entries for English speakers?
(I'm not shouting here, I just really want to emphasize that question.) -- Eiríkr Útlendi │ Tala við mig 18:37, 1 April 2013 (UTC)
I've always thought that was our target and the justification for getting the resources that we get from WMF. I am not at all sure that they are happy with our net contribution. I have no objection to all kinds of technical and obscure linguistic terms and concepts being used here, but they should almost certainly not be displayed by default to casual users. L3 headers are a prime example of what should be somewhat intelligible to normal users, but also glosses and "context" tags. Non-English L2's that contain definitions that use words that we label obsolete, dated, or archaic should be cleaned up, with more current words substituted. Gratuitous use of arcane terms when a more ordinary and current term is a synonym or near-synonym should be avoided in English definitions as well. And "grammar" context labels really don't need to use terms like ambitransitive, ergative, ditransitive when modest rewriting can eliminate the need.
I think we need the capability to have context labels that by default do not display, but which can be displayed by user preference. This might allow us to have our cake and eat it too in most cases. DCDuring TALK 19:17, 1 April 2013 (UTC)
  • Re: just the bit about labels that don't display by default:
Sounds like the equivalent of "advanced" settings in configuration UIs. :) That could be handled using CSS classes in the context / label templates, no? Should be relatively easy to implement. -- Eiríkr Útlendi │ Tala við mig 19:24, 1 April 2013 (UTC)
One might think so, but {{context}} is surprisingly complicated. Ruakh said he thought it was among the templates that most merited Lua/Scribunto-ization. From my limited knowledge of CSS, it would seem that we would want each context tag to have attributes that determined whether it was hidden by default and categorized. There are further questions as to whether topical categories should display differently when they do not reflect a limited usage domain (eg, airplane is topically in an aviation category, but is not at all limited in its usage). DCDuring TALK 20:24, 1 April 2013 (UTC)
If we're writing entries for English speakers, why do all these entries have crap about gender? Words don't have gender!
If you actually want to use the dictionary, you probably don't wanted it dumbed down for people who don't know anything about the language. If you're an English speaker who knows nothing of the language, our dictionary--any dictionary--will drive you nuts. Wann fängt man an, das Haus zu sanieren? We don't even mention that an is a seperable prefix in German (which we probably should), but even if we did, the English reader would still have to figure out whether it's a seperable prefix (of what? sanieren? Wann? Probably man, because that's closest) or one of many prepositions.
To grab another example, suomi is of type ovi (?!?) and has declensions of inessive, illative, adessive, abelative, allative, essive, translative, abessive, and comitative. They all look made-up to me, and even with the language-savvy people here, I'll be impressed if you've already realized I made up one and know which one that is. I, however, do not believe that just because I don't understand it, there must be an easier way to describe Finnish.
People who actually want to use our Lojban entries know what gismu means, because if you don't, you don't have a shot in hell of making more then word salad from a Lojban sentence no matter what we do. There's an argument for using affix instead of rafsi, but to me there's a stronger argument for using a consistent set of terminology for Lojban instead of mixing gismu and affix. If you wish to propose a set of replacements for the Lojban POS, I'm all ears, but consistency and usefulness to someone who knows enough Lojban to actually use a Lojban-English dictionary is important to me.--Prosfilaes (talk) 21:50, 1 April 2013 (UTC)
I agree with everything Prosfilaes said... I couldn't have said it better! —CodeCat 22:25, 1 April 2013 (UTC)
I'd love some evidence about who actually uses en.wikt for Lojban. I expect that most of the learning that takes place is by the contributors.
If someone is a serious student of Lojban, for how long in the language learning process do they need English glosses? There is a Lojban Wiktionary, after all, just a click away for a non-novice user, which gives the Lojban PoS. DCDuring TALK 22:34, 1 April 2013 (UTC)
So the world can be divided up into people that don't need Lojban definitions because they don't know enough Lojban grammar, and those who can use a Lojban dictionary? Why is that true of Lojban and not any other language?--Prosfilaes (talk) 01:17, 2 April 2013 (UTC)
  • (...after edit conflict...) Re: Finnish:
ovi isn't the POS. That's just plain ===Noun===. There's also a link, leading to an explanation in relatively clear, if somewhat obscure, English about what an ovi-type is. (FWIW, ovi means door, and is listed in that header as an exemplar term for that class of nouns.)
By contrast, gismu is given as the POS. There's no link. Manually looking up the term gives me a gibberish definition that's more notation than description, such that I have no clear explanation of what this is, even if I take the bother to try to find out. I note that other editors have commented that the definition of gismu, such as it is, leaves them similarly confused.
  • I've previously made the argument that Lojban terms that are labels for persons, places, or things should be listed under a ===Noun=== header, while those terms that are labels for actions should be listed under a ===Verb=== header, specifically from the point of view that noun and verb are generally understandable English terms, while gismu is not. The fact that a noun-ish gismu can be used predicatively strikes me much more as a matter of syntax, and thus something that should be explained in Appendix:Lojban_grammar or some similar place. I find it interesting that Category:Lojban_appendices doesn't even exist, particularly given how deliberately odd this language is.
As best I can tell from reading the the gismu section of the EN WP Lojban grammar article, the label gismu is more a statement of the term's intended functioning within the language as a root word from which other words may be constructed. This might be helpful for learning about the morphology, but it tells us nothing about the semantics, and is not a very useful POS label for lexicographical purposes. One may as well adopt a similar position for root English terms like talk or sink, and simply list all senses under a single ===Root=== POS header. -- Eiríkr Útlendi │ Tala við mig 22:52, 1 April 2013 (UTC)
  • And as an aside about gender, while English may not have grammatical gender aside from pronouns, the language does have the word gender, complete with an entry that is understandable by English readers. Grammatical gender and other features of non-English languages are at least described in English. Lojban alone seems to be described using Lojban. -- Eiríkr Útlendi │ Tala við mig 22:58, 1 April 2013 (UTC)
  • I feel like you missed much of the import of what Prosfilaes said. You still seem to be intent on catering to the lowest common denominator without thinking about how Lojban dictionaries are written. The only conclusion I draw is that we should wikilink L3 headers that use obscure words. That's a policy I could support. PS: As soon as I saw the joke grammatical case, I noticed it and it started to bother me. But declension is a special interest of mine, even though I know no Finnish.Μετάknowledgediscuss/deeds 01:08, 2 April 2013 (UTC)
  • I don't see the distinction between the POS and anything else there. Adessive is about as much an English word as gismu; that is, complete and total gibberish to the vast majority of English speakers, but hopefully familiar to the people who need to know what it is. Go ahead and improve gismu; I don't see that as relevant.
  • I think this is getting confused, because you're conflating two somewhat different things, the choices of the words we use for the parts of speech and what the parts of speech are themselves. As I said, I'm open to discussion on the first part. The second part, however, is absurd. Are we going to cram adessive words in with nominative, dative, accusative or genitive words? Because those are the English options, and the fact that all Finnish grammarians put them in their own class is irrelevant. Gismu is a part of speech in Lojban; stop trying to force it into English shaped boxes.--Prosfilaes (talk) 01:17, 2 April 2013 (UTC)
Well... technically gismu isn't a part of speech. As far as I know a gismu is a kind of brivla that is not a compound or foreign word, but is a basic root, much as you might call green an English root word but not greenish or blue-green. Brivla is the actual part of speech, because the different kinds of brivla, gismu included, are interchangeable. However, Lojban is unusual in that it generally requires loanwords to be specially marked as such using particles, so technically "loanword" (called fu'ivla in Lojban) is a part of speech in Lojban. It's comparable to the Japanese use of Katakana, except that the distinction is also spoken. —CodeCat 01:44, 2 April 2013 (UTC)
  • Allow me to restate. What is the purpose of the EN WT? If it is to serve as a many-to-English dictionary, should we not write entries that English readers can understand?
It seems no one here understands the word gismu well enough to explain what one is. This therefore seems to be a very poor choice of term to use as an entry label, especially one as important as the POS header.
Some have brought up the issue that other labels can also be obscure. While I might agree that adessive is not a word I use in daily conversation, and not one that I can claim to know very well, I must also point out that at least we have the resources available here for interested readers to find out what adessive means. I'd also like to point out that adessive is not a POS, but rather a declensionary subset of nouns. Much as adessive is a category of sorts for nouns, I see no reason why gismu (or, ideally, something more intelligible to the general English-reading public, such as root word) could not be a category for Lojban nouns, verbs, etc.
  • @Prosfilaes, you asked above,

So the world can be divided up into people that don't need Lojban definitions because they don't know enough Lojban grammar, and those who can use a Lojban dictionary? Why is that true of Lojban and not any other language?

Ironically, that's a different take on part of what I'm asking, only I think you're trying to make the opposite point. My point is that the labels for Lojban entries are in Lojban, which no one here seems to know very well, and for which we have pretty wholly inadequate definitions -- inadequate for any English reader who doesn't already know Lojban. So why is that true of Lojban and not other languages? It seems to be true in part because we're using Lojban to describe Lojban.
  • Reading through w:Part of speech, I see plenty of room for argument that a Lojban > English dictionary would do well to use labels like noun, verb, etc. Note that a dictionary that only Lojbanists can use does not meet my definition of a Lojban > English dictionary.
-- Eiríkr Útlendi │ Tala við mig 05:50, 2 April 2013 (UTC)
Then your definition of a Lojban > English dictionary is pretty weird and limited. As above, to make proper use of a dictionary like ours, you have to know something of both languages. There's other choices out there; I'm sure there are Germanic dictionaries that store Yiddish and Gothic in the Latin script, since most English speakers don't read Hebrew script. It certainly would help me when investigating what part Yiddish vocabulary played in Esperanto, but instead we made a Yiddish -> English dictionary that can only be used by people with some familiarity with Yiddish.
Reading through w:Part of speech a general summary of a subject with emphasis on Latin and English tell you everything you need to know about the structure of Lojban? No disrespect, but the Dunning–Kruger effect comes to mind. You haven't established that "noun" and "verb" are reasonable headings to use for Lojban words. Again, what words are being used for as Lojban POS labels is much more up for discussion to me then whether or not Lojban words should be split up into categories that Lojban grammarians don't.
Moreover, who is actually working on Lojban? If nobody who knows Lojban is interested into splitting them into nouns and verbs, the idea is dead on arrival. Renaming is doable by bot; rearrangement needing intelligence isn't.--Prosfilaes (talk) 07:25, 2 April 2013 (UTC)
  • Well, gast my flabber. With your comment that my "definition of a Lojban > English dictionary is pretty weird and limited" (i.e., my definition that a Lojban > English dictionary must be something that doesn't require being a Lojbanist to use), you seem to be saying that the EN WT is not a many-to-English dictionary, but rather a project for specialists in their individual fields (i.e., in this specific case, that the Lojban portion of the EN WT should only be for Lojbanists, and need not be intelligible to anyone else). Do I understand you correctly? Is that what you are saying? If that's not your intended meaning, then clearly I'm confused as to your position. I'm increasingly getting the sense that you and I are talking past each other. -- Eiríkr Útlendi │ Tala við mig 15:24, 2 April 2013 (UTC)
  • Did you read beyond that? A Yiddish-English dictionary written in Hebrew script, like ours and most are, is not intelligible to English speakers. Any Foo-to-English dictionary is a tool for people with some knowledge of Foo. People who know enough Lojban to understand gimsu and rafsi aren't specialists; they're the natural market for a Lojban to English dictionary.--Prosfilaes (talk) 21:45, 2 April 2013 (UTC)
    We do try to get contributors in languages that use non-Latin script to take the trouble to add non-idiosyncratic transliterations that can be found from our search box, don't we? Or is that too much to expect, too much catering to the lowest common denominator, the most scriptively challenged?
The use of the Gismu (Root) PoS header would almost certainly not have been tolerated if folks knew that's all that was meant. We don't have "Root" as a PoS for any other language, do we? That is not because other languages don't have such morphemes. It is because the Lojban PoS headers effectively excluded outside review. The Lojban "parts of speech" seem to confound etymological ("loanword"), morphological, and functional categories even more than our PoS headers for English. And a "name word" can't be called a Noun? Would we accept Metaphor as a PoS?
Perhaps the real problem is that Lojban is not a natural language and does not belong in a dictionary of natural languages. DCDuring TALK 22:34, 2 April 2013 (UTC)
It's amazing the number of people who argue that we don't know what gismu means who also argue that we know what it means.
I've never opposed adjusting the PoS of Lojban. So far nobody that knows Lojban has suggested it and proposed an adjustment, and certainly nobody has offered to make the complex changes that are being proposed here. And the changes you're proposing are not trivial; is red an adjective, a noun (a name word, "red object"), or a verb (łichííʼ)? The lines between adjective and noun can be thin enough in English; you're telling me we know certainly that it must exist in Lojban?--Prosfilaes (talk) 23:30, 2 April 2013 (UTC)
  • (...after edit conflict...) Yes, I read beyond that. And I read enough into it to realize that, if our underlying operating assumptions were so radically different, then many of the points I've been trying to communicate would not be understood in the way I had intended them. Consequently, I thought it best to try to clarify what your operating assumptions are regarding the goal of the EN WT.
Incidentally, I still don't know that -- what is your take on the point of the English Wiktionary? Is it to be a many-to-English dictionary, or is it to be a collection of specialist reference materials?
You mention Yiddish. I don't work on Yiddish, and I have almost no knowledge about that. I've run into similar problems of intelligibility in researching etymologies that led me to Sanskrit terms that had no romanization. This site has some very profound issues in terms of information accessibility -- I think either that many of us have gotten so far into our specialized mindsets that we forget that beginners can't understand what we write, or that many of us do not have any real goal of making this information available and understandable, or at the very least discoverable, to the average English reader. (This covers cases like adessive as a label, which can be discovered and at least somewhat understood by clicking on the entry, but not gismu, which no one here seems to understand, and which entry is quite opaque.)
My personal point in tackling Japanese entries here is in part to lower the barriers of entry to English readers who might try to understand that language. I recall my own painful frustration at feeling like I needed to read the whole dictionary just to be able to understand a single entry. I had multiple reference books commonly open on my desk -- an EN > JA dictionary, a JA > EN dictionary, and a kanji dictionary so I could try to puzzle out the other two. Until Kenkyusha came along with a furigana dictionary for non-Japanese-speaking learners, working with any JA dictionary was a real chore -- the information was not very accessible, nor very discoverable. Wiktionary offers tools to overcome these two profound issues, and I hope that some of my work here might make things easier for the next generation of English-reading learners of Japanese.
My understanding of the whole underlying point of the English Wiktionary is that it is intended to be a many-to-English dictionary, as a resource targeting the average English-reading dictionary user, with (ideally) no bias towards any specific target language. My misgivings about Lojban, and indeed about lack of romanization for Sanskrit or Yiddish or Georgian entries, arises entirely from this foundational precept.
Prosfilaes, what is your view of the basic point of the English Wiktionary? Is the ideal to create a resource for potentially any reasonably fluent English-language reader? Or is the ideal to create a resource for specialists in the field? -- Eiríkr Útlendi │ Tala við mig 22:47, 2 April 2013 (UTC)
Is it to be a many-to-English dictionary, or is it to be a collection of specialist reference materials?? You're trying to force a dichotomy that I do not accept in the least. The Oxford English Dictionary is a rare specialist reference material; why would you expect a many-to-English version of that to be any different? The places where we fall short of being maximally generalist, in deleting things that are SoP yet people might look up, are things that lean towards specialist reference material as opposed to the more generalist dictionaries.
You can't click on adessive and find out what it means; it doesn't link anywhere. You could improve gismu, and until you do I don't think you're competent to discuss the splitting up Lojban PoS.
The Lojban entries in Wiktionary will be most used by people who know some Lojban. Since all the grammars use words like gismu, that means those words will be familiar to the people who are actually using Lojban entries in Wiktionary. Changing them to some other words will probably make harder for the people trying to use those entries. Changing them to something that doesn't reflect the underlying nature of the language will be a substantial dumbing down that is not worthy of any work calling itself a dictionary and seriously hurt the few people actually trying to use these entries.--Prosfilaes (talk) 00:57, 3 April 2013 (UTC)
I haven't read the full discussion but having specific POS for some languages is not so uncommon. Consider Arabic "masdar" (مصدر(máṣdar)), sometimes called a "verbal noun" but that's only an approximation, the usage of masdars is wider. Russian predicatives (Category:Russian_predicatives) were also considered non-standard at some stage. We have to cater for the English audience but also educate users about language-specific parts of speech. The Japanese のadjectives (Category:Japanese の-no adjectives) got deleted, even though English linguist often use this term but Japanese linguists consider them noun phrases, e.g. 病気 (byōki no hito) a "a sick person", lit. "person of the sickness". I have no hard feelings about the deleted category but can this collocation really be considered the same as "fur coat" by the English reader? A new term "adjectival noun" is used to describe such nouns. Just my two cents. I haven't read the full discussion, so it may not be quite to the point. --Anatoli (обсудить/вклад) 01:21, 3 April 2013 (UTC)
  • Because my previous comment did not make this clear, I want to point out that I oppose changing Lojban's headers from "gismu" etc to "noun" etc. It would be inaccurate to call gismus "nouns"—Lojban was designed, as artificial languages can be, not to have nouns, verbs etc, with the unsurprising result that English-speaking linguists don't describe it as having nouns, verbs etc. ("Lojban nouns" is barely attested on raw Google, and nonexistent in Books.) - -sche (discuss) 03:38, 3 April 2013 (UTC)
    • If it helps, I've changed the definitions of gismu and brivla somewhat, hopefully to make it more clear. I'm not sure if "gismu" is actually a part of speech in Lojban, I think brivla really is the PoS. On the other hand, a fu'ivla (loanword) often behaves differently from other brivla so the idea that all brivla are interchangeable doesn't work here. They do all work more or less the same, though. —CodeCat 14:05, 3 April 2013 (UTC)
  • I oppose changing the headers to noun, etc, but I think it would probably be helpful if we could link the headers to an Appendix:Lojban gismu which could explain exactly what it means. --Yair rand (talk) 01:34, 4 April 2013 (UTC)

Romanization and definition lineEdit

FYI: Wiktionary:Votes/pl-2013-03/Romanization and definition line.

Let us discuss as needed, and then start the vote; let us postpone the vote as needed.

I propose that each romanization entry is required to have a definition line in the wikitext. This is already the case with Pinyin romanization entries and Gothic romanization entries. Entries that direct the reader to another page hosting definitions include alternative form entries and inflected form entries; these do have a definition line in the wikitext per common practice. See also Category:Gothic romanizations and Category:Mandarin pinyin. --Dan Polansky (talk) 18:01, 30 March 2013 (UTC)

Recognizing User-Page SpamEdit

We've been getting lots of user pages lately created by spambots. Looking through the deletion logs, I see quite a few deletion comments showing that admins aren't recognizing them for what they are.

Spam isn't just for putting advertising where people will see it. The main purpose, lately, is to get Google to see it. Google partly bases the order in which search results are listed on how many links and references there are on high-traffic sites like ours. The content of the spam page is irrelevant: as long as it contains links to the site or key phrases associated with links to the site anywhere on the page, it will improve the page ranking on Google.

The most common type of user-page spam is text taken from other websites with links and brand names hidden in it. The purpose of the text is to camouflage the links and brand names so the page is less likely to be deleted. Such pages aren't just spam, they're also copyvio. Delete them as promotional material, and permanently block the poster for spamming (some bots will re-create the pages if they're deleted and the account isn't blocked).

The other type is a fake profile, with a randomly-generated name, randomly-generated personal details, and a spam link said to be a favorite site or the user's home page. The combination can be really funny: a user name starting with "Dave" that belongs to a 16-year-old girl in Switzerland who likes horses, and whose favorite site sells erectile-dysfunction meds, or tractors in India. These should also be deleted as promotional material and the user permanently blocked.

The first of the types I just mentioned also gets posted to talk pages, so it should treated as spam there, as well. Chuck Entz (talk) 18:48, 30 March 2013 (UTC)

The easiest way to patrol these is to look for the "new-user-page" tag. Not all edits with that tag are bad, but most of them will be. —CodeCat 19:13, 30 March 2013 (UTC)
I thought there was some way to get Google to ignore those pages, which would eliminate the incentive to create such pages. bd2412 T 19:17, 30 March 2013 (UTC)
Do you think that would actually work? —CodeCat 19:26, 30 March 2013 (UTC)
That won’t work. There’s zero cost for them to keep spamming every wiki in the world even if it only does any good on some tiny percentage of them. It would be more work for them to be selective. Michael Z. 2013-03-30 19:37 z

Standards of identity and legal definitionsEdit

Following this RFD, this BP discussion, and this old discussion, I have created Wiktionary:Votes/pl-2013-03/Standards of identity and legal definitions of terms. Most of the credit for the proposal, which I hope I've done an adequate job of wording, goes to bd2412. Please discuss here or on the vote's talk page any change to the wording or any entirely different approach you'd like to see, after you read the previous BP discussion (link) for background. - -sche (discuss) 21:34, 30 March 2013 (UTC)