Wiktionary:Grease pit/2008/November

`{{plural of}}`

This template does not take full advantage of language or script arguments. For example, in {{plural of|dom|lang=pt}}, the link it creates is simply dom, not dom#Portuguese/dom. And if there is a script argument, it has no effect on the appearance of the link...for instance, {{plural of|كتب|lang=ar|sc=Arab}} (which produces "plural of كتب" instead of the expected "Plural form of كتب"). —Stephen 16:48, 5 November 2008 (UTC)[reply]

Even for English, it should section-link to English#Noun (or possibly English#Etymology n (where n not equal to 1). DCDuring TALK 17:00, 5 November 2008 (UTC)[reply]

Note this is not possible. You can't link to (name) a nested section, e.g. #English#Noun, and linking to a numbered header will just break when the target page is changed. Robert Ullmann 18:03, 6 November 2008 (UTC)[reply]

For English it's possible, I think, as it can just link to #Noun. Wouldn't work for other languages.—msh210℠ 06:46, 7 November 2008 (UTC)[reply]

Yes, that's what I do by hand. English uniquely has the advantage that it can be omitted. DCDuring TALK 08:24, 7 November 2008 (UTC)[reply]

The problem is that people are encouraged to use this template as {{plural of|[[word]]}} in order to increase the page count. This prevents it from doing clever tricks. We should stop doing this for that reason. Conrad.Irwin 01:04, 6 November 2008 (UTC)[reply]

That just requires a bit more magic. Yes, at present linking the referenced form prevents/would prevent the section link from being added, but it won't keep the lang= param from categorizing (as it does now), and won't interfere with implementing sc= if desired. Now, if I could extract a #unlink parser function from the developers, there is lots we could improve. But it is hard enough to keep them from steadily breaking things, so I have no hope of getting an enhancement. Eh? (Even tho' simple and useful, and about zero overhead.) (I also find the section linking to be supremely annoying, and wish I could figure out magic (short of JS) to turn the effing things off! For me. I just want the page, not some seemingly random position within. Maybe if it worked properly?) The demand for section linking does make quite a number of things enormously more complex. It accounts for at least 70% of all of our template code complexity, direct and including the side effects. But people endlessly want it ... (;-) Robert Ullmann 17:59, 6 November 2008 (UTC)[reply]

The section linking to PoS OR Etymology (never both, I know) only matters for long English sections. It matters less for {{plural of}} than for the verb form templates, like {{past of}}, because verb lemmas often appear last in entries with multiple English PoS. I sometimes do it by hand. DCDuring TALK 20:00, 6 November 2008 (UTC)[reply]

Because we have no other, more relevant metric, the entry count has assumed great and probably unwarranted importance. We could well stand to pay greater attention to other figures of merit. Our share of visitors relative to other dictionary sites is not a figure of merit. We need a few measures that more directly connect to the efforts of contributors: number of PoSs by language, number of lemmas (one-word, hyphenated, and other), number of non-stub etymology and pronunciation sections, number of non-conforming (WT:ELE) entries are non-user-based measures. The fact that we have no model of how these measures contribute to whatever service objectives enwikt may be said to have merely affects the effectiveness of our efforts. The measures could still show progress (or its lack) and possibly spur effort (or thought as to direction or new initiatives). DCDuring TALK 05:45, 6 November 2008 (UTC)[reply]

It's more than just our metric. WM considers a page with no explicit wikilinks to be "bad" and it gets added to a list of pages that don't link out. If we eliminate linking from within templates like these, then we can't find those pages that truly do need to have wikilinks added to improve them. The links are one of the key ways in which an electronic dictionary improves over a print dictionary. --EncycloPetey 20:17, 6 November 2008 (UTC)[reply]

I agree that other metrics are also important. I would just like to have more of them so that the few we have don't take on unwarranted importance. As you know I would like to have measures of success with respect to users (repeat visits, visit counts relative to competitors) and quality measures for the first screen a user hits (how many pagedowns/scrolls to find what they want OR screen-inches to first definition). I wish I knew how to do these things myself, instead of wishing on a star. DCDuring TALK 21:06, 6 November 2008 (UTC)[reply]

`{{es-conj-er (c-zc)}}`

I see that there is a problem with this Spanish template. The reflexive imperatives are missing accent marks. In parecerse, it incorrectly shows: parecete, parezcase, parezcámonos, pareceos, parezcanse...it should read like this: parécete, parézcase, parezcámonos, pareceos, parézcanse. The template is far too complex for me, someone else needs to fix it. —Stephen 04:35, 6 November 2008 (UTC)[reply]

Fixed. --Bequw → ¢ • τ 09:53, 6 November 2008 (UTC)[reply]

Template:ro-decl-adj hides red links

I think it unfortunate that Template:ro-decl-adj shows red links in the normal text color. I did not find a way to undo this, it is probably a CSS issue, I think with class inflection-table. Could someone change this such that red links are red again? H. (talk) 11:17, 6 November 2008 (UTC)[reply]

Most inflection templates hide red links. Many people find it ugly to have loads of red links in a table. Having the links present but in normal text color was the compromise found between those who wanted links to exist for entries not yet created and those who didn't want their tables all full of red. Angr 11:29, 6 November 2008 (UTC)[reply]

There was a lot of argument about this. See this discussion and others similar. If you want to fix it for yourself, put the following code into Special:MyPage/monobook.css and do the cache-clearing rigmarole. Finding a globally accepted is unlikely to happen. Conrad.Irwin 14:01, 6 November 2008 (UTC)[reply]

 .inflection-table a.new { color: #CC2200; }

This should be moved to WT:PREFS, so that those who have conniptions when they see red links in inflection tables can select it, and everyone else defaults to the correct behaviour. Conrad, I thought you were going to do this; I was surprised to see it show up in the common css? Requiring users who want standard behaviour to add custom CSS is no good. (I hardly need to tell you that ;-) Robert Ullmann 17:46, 6 November 2008 (UTC)[reply]

I thought I was going to do that too, but I think I got moaned at by people who insisted I shouldn't change how things worked without a vote. Conrad.Irwin 01:39, 12 November 2008 (UTC)[reply]

It wasn't ~~broken~~ decided by a vote in the first place, and was hardly "resolved". Just do it right. Seriously non-standard wiki behaviour must be opt-in, not opt-out. Robert Ullmann 17:17, 12 November 2008 (UTC)[reply]

If you like you can start a vote to overturn the last decision. That way, those of us who agree with this practice can quickly vote it down and not have to endure another long and fruitless discussion on this resolved issue. --EncycloPetey 20:19, 6 November 2008 (UTC)[reply]

Agree with EP and Robert Ullmann. This should remain as is, but a WT:PREFS bit would be nice. -Atelaes λάλει ἐμοί 20:52, 6 November 2008 (UTC)[reply]

The options are,

"No links in inflection tables" (what's the point in linking to form-of entries anyway)
"Red links in inflection tables" [with a WT:PREF to go black] (as people won't create the pages if they don't know they don't exist)
"Black links in inflection tables" [with a WT:PREF to make it red] (as inflection tables are hard to read if full of red links)
"Blue links in inflection tables" (and set up a nifty bot-like thing that will automatically create inflections for trusted users - though there will still be a few "other" links hanging around)
"Green links in inflection tables" (and give people accelerated form of creation everywhere - though they may not use it)
"Octarine links in inflection tables" (which change both the caption of the link and the link title at random to amuse people like me)

Shall we vote on it, do a bit more bikeshedding, or just choose the obviously correct answer? :p Conrad.Irwin 01:39, 12 November 2008 (UTC)[reply]

Just take the line out of common css so we get the correct (default) behavior, and add WT:PREFS to turn them black for people who want that. (and check it against the standard prefs option on displaying "red" links) I thought you were going to do this? Robert Ullmann 17:09, 12 November 2008 (UTC)[reply]

See my reply above... Conrad.Irwin 17:11, 12 November 2008 (UTC)[reply]

None of this would be a problem if you'd just done it right in the first place (and you know this :-). Please just do that, and then there will be nothing left to vote on. Please? Robert Ullmann 17:30, 12 November 2008 (UTC)[reply]

The WT:PREF is now there (will make the red links black). I do not believe there is enough consensus to remove the default rule (which is the same as enabling the PREF - disabling the PREF won't make the links red). Conrad.Irwin 02:21, 13 November 2008 (UTC)[reply]

What about the Italian option. Red links in inflection and conjugation tables and a bot to create the forms automagically. SemperBlotto 10:09, 12 November 2008 (UTC)[reply]

Se shared-form-of stuff below. Conrad.Irwin 02:21, 13 November 2008 (UTC)[reply]

XML dumps

FYI:

WMF is finally running the XML dump of the en.wikt: http://download.wikimedia.org/enwiktionary/20081112/

The daily for today is done: http://devtionary.info/w/dump/xmlu/en-wikt-20081112.xml.bz2

Tomorrow's daily will be built from the WMF dump, and then proceed forward day by day as usual. Robert Ullmann 17:25, 12 November 2008 (UTC)[reply]

Today (13 Nov) daily available. Works just fine to use the WMF dump and go forward. Robert Ullmann 15:58, 13 November 2008 (UTC)[reply]

Ideas for shared form-of bot

It seems to me that there are a lot of languages that could do with running a form-of bot, but for which we have no native-speaking contributor volunteering to run one. To this end, it would be nice if there was a form-of bot that a set of trusted users in a given language (maybe run in a way similar to WT:WL) could instruct to create inflections. It shouldn't be too difficult to run such a bot, on the condition that we can easily give it the entries to create - as there are different standards governing form-of entries in different languages it probably couldn't just use the same template for everything (though if it could, that would be even better). I therefore propose a bot which works in the following way:

Configure the bot for a particular part of speech in a particular language.
1. Add meta-data in class-names to the relevant inflection tables (as accelerated editing does).
2. Create a template that generated a list of pages as its output (so the bot doesn't have to worry about formatting conventions).
Have a requests page, and to add a new request, the trusted user simply replaces the pages contents with links to relevant pages/sections (this allows trivial checking of who added which request).
Have a permissions page where a (full-protected) list of Users/Languages is maintained (to prevent too many random forms being created)
Have a mechanism to undo a request.

Before I get too excited about building such a thing, would anyone be interested in using it? Would people be happy for such a bot to exist? Would it be better to create "page-list-templates" or add "meta-data" to existing templates and then use a standard form for the entries? Conrad.Irwin 20:48, 12 November 2008 (UTC)[reply]

There are languages and POSes where this could work easily, but there are languages where it may be a very bad idea indeed. Let me give a Latin verb example of scary things that can happen. Latin is a highly inflected language; a single verb entry will have more than 100 inflected forms. I have been going through and cleaning up Latin verb entries.

More than 10% of the time, the conjugation table for the verb has been wrong. What can be wrong? The wrong basic pattern (of 5 major Latin verb patterns). Or, in cases where the basic pattern is correct, some Latin verbs have only active voice forms known, or only passive voice forms known, but the table chosen by the page's last editor includes both forms, so that there are more than 50 incorrect links in the conjugation table. This doesn't account for additional probelms of defective verbs, where some inflectional forms are unattested in the surviving literature (such as only having third-person passive forms, but not first- or second-person passive forms). Again, if the previous editor did not do a thorough checking of the verb's conjugation pattern beyond what even some good dictionaries provide, then there may be 50 or more incorrect red links in the conjugation table on the entry. If a bot is then run on that page, using the incorrect template, we generate 50 or more nonsense entries. But even that's not the full scope of the potential problems. Some inflected forms of verbs in Latin are Participles, which themselves have a large set of inflectional forms. So the single cell in a verb conjugation table for the perfect passive participle is just for that participle's lemma. The participle itself will have an inflectional table with 36 additional cells!

The result is that I held off for a long time before running a bot on Latin verbs even though I felt reasonably comfortable in Latin. And there were many more considerations I tried to explore and solve before proceding. I'd be very, very wary of having a bot that generated inflected entries for any but the most worked-over lanugages here, and for which we have people knowledgable in both the grammar and translation of said language as well as knowing something of how Wiktionary sets up entries. Otherwise there is a tendency to run solely on the enthusiasm without taking full and proper time to work out issues first, in which case we end up with the kind of mess we got for Spanish verb conjugations several years ago (a mess that still hasn't been cleaned up).

That's not to say there aren't languages for which such an effort could be useful. I expect that French, Portuguese, and Russian might be viable for such a bot at least, but there are other languages that have not received proper attention from experts, or where decidedly non-expert editors have run amok. However, I'd have to say that my view on the general viability of such an endeavor is pessimistic. --EncycloPetey 05:27, 13 November 2008 (UTC)[reply]

While I think there is merit to the idea, EP does well to note how amok such a thing could run. There are certain users who I would trust with such capabilities, but the list is short. Additionally, I would argue that such a thing should be limited to living languages, because of the specific and peculiar issues inherent with dead languages, which EP noted. The benefits to such a project could be quite large indeed, but we would have to be very careful, lest we wind up with a very large mess on our hands. -Atelaes λάλει ἐμοί 06:41, 13 November 2008 (UTC)[reply]

language templates and some magic

quite a while ago, DAVilla tried to set up language templates with a #switch, where the default would be the language name linked or not as preferred, an option would be to get the name un-linked, and other things. This made a complete mess when subst'd, as the entire switch ended up in the page (subst doesn't recursively subst parser functions, which is of course a serious bit of brain-damage in the parser, but that is hardly news). Today I was once again noting that it is possibly to do if/then/else without the parser function #if; indeed that is how it used to be done. And it occurred to me (and I don't know why I didn't see this before) ...

At present, if a language name is usually linked in the translation tables, we link it in the language template, so that for language code xx, {{subst:xx}} will do the right thing. (And if left un-subst'd, AF will do that.) This makes it impossible to use these templates for lots of other useful things, since the parser makes it impossible to unlink something once linked. We have had to create an entire duplicate set of lang: templates just to get the unlinked forms. (As noted, one can't use #if inside the templates, it won't subst.)

But suppose we set up the language templates like this (when linked):

{{{l|[}}}{{{l|[}}}Volapük{{{l|]]}}}<noinclude>[[Category:Language templates|vo]]</noinclude>

(the first two brackets have to be separate, else the parser tries to find the link "too early")

Now,

{{vo}} is Volapük
{{vo|l=}} is Volapük
{{subst:vo}} is Volapük
{{subst:vo|l=}} is Volapük

in general given a parameter lang, {{{{{lang}}}|l=}} is the unlinked language name. (This still shouldn't be done directly, we always use {{language}} or {{langname}}, precisely so this stuff can be fixed!)

A very nice bit about this is that the transition is simple, linking this way in the templates works with what we have now, and {language} and {langname} can easily work with either method. Doesn't require an abrupt switch. The server overhead is infinitesimal compared with doing a page query for the lang: template as well.

So we can dispense with the lang: templates. Commentez vous? Robert Ullmann 16:37, 13 November 2008 (UTC)[reply]

I don't fully understand how or why this works, but a quick test demonstrates that it does what you say. Clever. However, I don't know enough about the technical impacts to other existing templates to voice any further opinion. --EncycloPetey 18:37, 13 November 2008 (UTC)[reply]

I didn't understand a single word. However, if you want me to use that new format for language templates, and skip the lang: variants, I will do so. -Atelaes λάλει ἐμοί 19:53, 13 November 2008 (UTC)[reply]

That makes a lot of sense, and is a lot cleaner than the existing approach. I say go for it. —Ruakh_TALK 21:15, 13 November 2008 (UTC)[reply]

Looks great. --Bequw → ¢ • τ 02:26, 15 November 2008 (UTC)[reply]

The change to {{vo}} (with the supporting tweak in {{language}}) has resulted in the lang:vo template being orphaned, where it was at 1000+ previously. Looks fairly good. Robert Ullmann 23:41, 13 November 2008 (UTC)[reply]

We might start by making the change only to a limited number of ISO templates, preferably those that exist for languages where a lot of varied editing happens on the part of experienced editors, and so any problems are more likely to be noticed quickly. The catch, of course, is that most such languages don't have the language name linked in the template to begin with. --EncycloPetey 00:02, 14 November 2008 (UTC)[reply]

Because it doesn't change the behaviour when the "l" parameter is missing, and there is no reason why anything would be passing "l" to these previously (since they took no parameters), it is not likely to break anything. (As near "impossible" as these things get in reality, i.e. mostly not, although this is close to provable, failing only because the WMF parser is, um, ill-defined ;-). The interesting bit is templates using the lang: templates directly, but they won't break, just hold up the process. (I fixed {{attentioncat}} a bit earlier, there are probably others.) But, as noted, no hurry here as there isn't any abrupt transition. {{vo}} is done, we can simply continue. Robert Ullmann 00:15, 14 November 2008 (UTC)[reply]

{{grc}} also now has the new code. I'll keep an eye out for any issues. -Atelaes λάλει ἐμοί 00:18, 14 November 2008 (UTC)[reply]

Indeed, good thing to try. Check out Special:WhatLinksHere/Template:lang:grc now. Robert Ullmann 00:32, 14 November 2008 (UTC)[reply]

I'm going to try changing more of them; sort of semi-automatically. Robert Ullmann 12:21, 18 November 2008 (UTC)[reply]

Wiktionary:Requested entries.

Something has suddenly changed in the "Requested entries" link in the navegation box. A link was removed, but I don’t know what the name of it was. It was the page that contained requests for speedy deletion, missing gender requests, requests for scripts, requests for page translation, requests for language clean-up, and requests by specific language. All of that is gone and I have no idea where it went. —Stephen 11:58, 14 November 2008 (UTC)[reply]

You mean [[Category:Requests]], which was hidden a few hours ago by this edit: Category:Requests?diff=5524697. I think we should unhide it, since it mostly includes non-content pages from which we do want links to it. If we don't want it to appear on entry pages, then we should have a hidden [[Category:Assorted requests]] subcategory and put the entries in there instead. —Ruakh_TALK 12:59, 14 November 2008 (UTC)[reply]

Yes, that’s the one I was looking for. Definitely unhide. —Stephen 13:05, 14 November 2008 (UTC)[reply]

Shouldn't be hidden, I removed HIDDENCAT. Individual entries in mainspace should be in the language-specific requests cats. A few are turning up here because of missing lang= parameters, e.g. in {{rfc-inflection}} (which I never knew existed ;-). IMHO we are getting a few too many categories hidden. Robert Ullmann 14:40, 14 November 2008 (UTC)[reply]

I also removed HIDDENCAT from Category:Requests by language. Why ever hide that? It only appears in the per-language requests cats, and is the primary cat containing them! Robert Ullmann 14:55, 14 November 2008 (UTC)[reply]

Ditto Category:Translations to be checked. Robert Ullmann 15:08, 14 November 2008 (UTC)[reply]

In general, we have gone overboard in hiding cats; if people don't see requests cats, how are they ever to know what they might help with? Hiding the individual TTBC cats was fine, hiding the checktrans cat so that the cats are hidden entirely was not. Robert Ullmann 15:08, 14 November 2008 (UTC)[reply]

Has there been a discussion about this? I was the one that hid all those categories, as you might have noticed. My reasoning was: all categories that are related to housekeeping, that is, which are not really dividing the content by what they mean, but rather by formatting/completeness or similar should be hidden, since they are, indeed not relevant to the casual user that only wants to find information on a word. I see that some of those categories do not appear on content pages, so there we needn’t be so strict, it was just a logical step.

As for ‘know what to help with’: we could point out the existence of those categories in {{welcome}}, which everyone is supposed to get, right? How about mentioning WT:PREF there as well? H. (talk) 12:43, 18 November 2008 (UTC)[reply]

Could someone who knows how please add a lang= argument to {{homophones}} such that all words link to the given language section. e.g.

{{homophones|bous|bout|lang=fr}}

would link to bous#French and bout#French. Thryduulf 15:47, 14 November 2008 (UTC)[reply]

anyone? — This unsigned comment was added by Thryduulf (talk • contribs) at 19:31, 19 November 2008 (UTC).[reply]

Done. Sorry for the delay. We kind of have a bunch of ways of doing it — {{term}} and {{t}} use slightly different approaches, plus we have {{language}} and {{langname}}. Each has its merits; I chose {{term}}'s approach, as a compromise between simplicity and efficiency. ({{t}}'s approach is more efficient, if there's a bot setting xs=, but that's a pain. Using {{langname}} would have been simpler, but given that a few specific languages are going to be the vast majority of cases, even more so than with translations, I figured it was worth it to grab {{t-sect}}'s short-cut list.) But I didn't add support for sc=, because it didn't occur to me, so it's still only really suitable for Latin-script languages. I hope that's O.K. —Ruakh_TALK 00:37, 20 November 2008 (UTC)[reply]

Thank you. The case I quoted above is the only instance of a non-English homophones section I can recall seeing offhand, so Latin is fine atm. If I (or anyone else) finds a need for script support we can always add/request it then. Thryduulf 02:45, 20 November 2008 (UTC)[reply]

Template:langcatboiler

With more than 29 {if} statements, and with the intention of being used on every Language category, this seems like a serious server drain. Is this reworkable or not? --EncycloPetey 20:40, 14 November 2008 (UTC)[reply]

The #if function itself doesn't create overhead; it is (as you know) that both branches are evaluated, and once someone starts nesting if and switch and sub-template calls you get explosions like {{projectlinks}}.

This is very straight line, I don't see any problem. One might re-factor the conditionals around the box templates to avoid a few calls, but that isn't much. Robert Ullmann 12:28, 15 November 2008 (UTC)[reply]

special page listing redirects?

Is there a special page that lists all redirects? RJFJR 07:01, 15 November 2008 (UTC)[reply]

Sorry, I found it. RJFJR 07:03, 15 November 2008 (UTC)[reply]

Special:WhatLinksHere

Any idea why this gives so many completely erroneous results at Special:WhatLinksHere/Template:see? Conrad.Irwin 18:15, 15 November 2008 (UTC)[reply]

Because all of the pages that were edited (by AF or whomever) before the redirect was "flipped", still have the old cached information. A judicious edit to {also} ought to purge them all. (MW "features" strike again ;-) Robert Ullmann 22:43, 15 November 2008 (UTC)[reply]

I've added a tenth parameter to it (as pages were being added to Category:Limit of template reached) which seems to do the trick. Conrad.Irwin 01:59, 16 November 2008 (UTC)[reply]

Oh, just btw Conrad: thanks for bot-converting most of the pages with {see} to {also} without mentioning it anywhere. I just spent 2+ hours trying to figure out why the queue and timers in AF went all screwy on me today, and it wasn't able to complete any see->also edits. Will be another hour before I sleep. Robert Ullmann 22:13, 16 November 2008 (UTC)[reply]

note for everyone the huge block of edits to convert see to also done this morning caused a break in continuity in the update window for the daily XML dump process; edits between 16 Nov 00:00 UTC and 09:00 UTC (very approximately) may not be (almost certainly are not) reflected in today's dump, which would otherwise be expected to be up to the minute; this large block of edits (some 10's of thousands apparently, all of Conrad.Bot's and 9 hours of everything else) will not be fully reflected in the dumps until the daily of 24th November. (New entries are not affected.)

also the AutoFormat process will be running very slowly for the next week, as it updates its task cache; it had been tasked with these changes, and now will do network operations to attempt to do most of them. I will restart it after each daily, so it will get better; but in the meantime it will be spending a great deal of its time trying to do the same edits. Since it slows down after a null edit (no change) thinking there is nothing or not much to do, it will be running very slowly. (The alternative is to chew up my net bandwith re-checking every single Conrad.Bot edit! Not going to.) Will slowly get better over the next week. (and yes, I've hacked it up to try to minimize this effect, but it won't completely recover until next weekend.) Robert Ullmann 23:28, 16 November 2008 (UTC)[reply]

Well, at least it didn't clog up RC. I must admit I'm a little surprised that AF can't simply have one of its many tasks turned off (i.e. tell AF not to bother with see --> also for a few days). Bear in mind that this comes from someone who is utterly incapable of writing something as immensely useful as AF, and is damned glad there is someone who can; surprise, not criticism. Also, bear in mind that I probably won't understand the answer, if there is one, so just keep your "the regex cache can't recompile the string key" nonsense to yourself ;-). In any case, well done both of you for taking care of that rather large and daunting task. I suppose we should now move on to the rest of the templates which are hindering the creation of language templates. Didn't someone write a list of all of them somewhere? -Atelaes λάλει ἐμοί 00:59, 17 November 2008 (UTC)[reply]

One can't just turn off the task: people who aren't here every day will be adding {see} for months into the future, that's why AF has and needs the rule. and also why there was no point whatsoever in doing something already handled. I have hacked in a rule to keep it from trying most of these for a little while; hence the need to re-test. (And note that the task isn't done yet; Conrad used one simple rule, and AF will be doing the several hundred remaining presently. And AF was also doing other things at the same time; all the minor spacing, as well as other rules. (In particular, sorting {wikipedia} and other known RHS stuff above {also} in the "prolog" which improves presentation (less vertical space). Now I have to go write code to find that case, where AF would have found them all as it went. All in all, doing the huge block with one rule was lose, lose, and lose. (And now it is 4AM, I am not going to finish retest until at least 5; and there is daytime work to do, ya know? It does look like it is okay, but will be a week before it is fully sorted.) I try to do the same kind of solid engineering I do for clients, and then it just gets casually blown away. I'm not really pissed at Conrad, but he could have maybe asked if it was a good idea, or just maybe a really bad one? It just didn't effing need doing: it was effing handled already. Conrad was doing good useful stuff with the "Acceleration", why take time out to cost me 6-8 solid hours of work? *sigh* I'll stop now. Robert Ullmann 01:18, 17 November 2008 (UTC)[reply]

I was unaware it would have such an adverse effect. Sorry. I merely assumed I was moving the edits from being on RecentChanges to off of them. Conrad.Irwin 01:46, 17 November 2008 (UTC)[reply]

Yes, I understand. But in fact moving them of RC is part of the problem: I told BD2412 (on his talk, as you know) to just go ahead and do the few hundreds he wanted to precisely becuase they were appearing in RC (!): AF was seeing them, reading the edit, and updating its cache (effectively), as well as doing any other related edits. But the big block run off of RC measn that right now there are about 25 thousand records difference between AF's cache and the live DB, and the result ain't pretty. (note that other bot edits rarely do anything that changes the status for AF; sometimes when the do, they add tags, like SemperBlottoBot)

It will sort itself, as will the XML dumps; don't worry about it now; cost is sunk. Robert Ullmann 02:01, 17 November 2008 (UTC)[reply]

Just thought I'd say this clearly: I'm not mad at or angry with Conrad. It has just been frustrating. The XML dumps won't be fully sorted for a while, but that mostly means some edits missing that won't affect most uses. AF is st up to get itself re-sync'd. Robert Ullmann 17:54, 17 November 2008 (UTC)[reply]

en-verb

Template {{en-verb}} is broken - see break or any of very very many other verbs.

The template itself was edited about a week ago - so something lse must have changed. SemperBlotto 12:27, 16 November 2008 (UTC)[reply]

I've just spotted this and assumed it was Conrad's insertion of metadata a few hours ago that broke it. However reverting that didn't fix it (so I undid my reversion). A look at a random selection of English verb entries suggests that potentially every single one is affected! Thryduulf 12:59, 16 November 2008 (UTC)[reply]

It uses sub-templates to get the values to display. User:JackOfClubs thought it would be a good idea to put them in a category, as he did so he added extra white-space to the output which broke the main template. Should now be fixed, I'll have a scan over his other recent template edits... Conrad.Irwin 13:12, 16 November 2008 (UTC)[reply]

*sigh* I had fixed a few and asked him to check the rest. I supposed I should have just gone back through all of them myself. Robert Ullmann 22:15, 16 November 2008 (UTC)[reply]

<centralnotice-template-2008_meter_1>

This, or similar, is now appearing at the top of every page. Have the lunatics taken charge of the assylum today? SemperBlotto 16:21, 16 November 2008 (UTC)[reply]

I hope that wasn't because of me --Jackofclubs 16:24, 16 November 2008 (UTC)[reply]

template:new alt

Any objection to my changing this from ==English== ===Noun=== '''{{<includeonly>subst:</includeonly>PAGENAME}}''' # {{alternative spelling of|[[<includeonly>{{{1}}}</includeonly>]]}} to

=={{<includeonly>subst:</includeonly>lang:{{{lang|en}}}}}==

==={{<includeonly>subst:</includeonly>ucfirst:{{{2|Noun}}}}}===
{{infl|{{{lang|en}}}|{{{2|noun}}} }}

# {{alternative spelling of|[[{{{1}}}]]{{<includeonly>subst:</includeonly>#if:{{{lang|}}}|lang={{<includeonly>subst:</includeonly>lang:{{{lang}}}}}}}}}

or can someone fix it, as needed, or suggest something else, similar?—msh210℠ 06:06, 17 November 2008 (UTC)[reply]

I think that it should be the following code (now that the template uses langname). We should also encourage people to replace the {{infl}} with whatever is more specific. Conrad.Irwin 09:25, 17 November 2008 (UTC)[reply]

=={{subst:{{{lang|en}}}}}==

==={{<includeonly>subst:</includeonly>ucfirst:{{{2|Noun}}}}}===
{{infl|{{{lang|en}}}|{{<includeonly>subst:</includeonly>lc:{{{2|Noun}}}}}}}

# {{alternative spelling of|[[{{{1}}}]]{{<includeonly>subst:</includeonly>#ifeq:{{{lang|en}}}|en||{{<includeonly>subst:</includeonly>!}}lang={{subst:{{{lang}}}}}}}}}

Did you deliberately subst the L2 header? If so, why? Also, wouldn't that last lang={{subst:{{{lang}}}}} add brackets, ruining things?—msh210℠ 06:56, 18 November 2008 (UTC)[reply]

That last is not correct, lang= is used only for the category. Question: do you want to put these in the POS cats? Using {infl} will add it to the noun category. Which we haven't usually done for alt spellings. Robert Ullmann 12:19, 18 November 2008 (UTC)[reply]

I do not understand your first sentence, RU. Would you mind explicating? If it's accepted practice to omit them from the POS cats, then fine, omit them, but I didn't realize that it is; I've been using {{infl}} on alt.sp.s myself.—msh210℠ 20:05, 19 November 2008 (UTC)[reply]

Since conversation has stalled, I'm going, if there's no objection soon, to change it to

=={{<includeonly>subst:</includeonly>{{{lang|en}}}}}==

==={{<includeonly>subst:</includeonly>ucfirst:{{{2|Noun}}}}}===
{{infl|{{{lang|en}}}|{{<includeonly>subst:</includeonly>lc:{{{2|Noun}}}}}}}

# {{alternative spelling of|[[{{{1}}}]]{{<includeonly>subst:</includeonly>#ifeq:{{{lang|en}}}|en||{{<includeonly>subst:</includeonly>!}}lang={{subst:{{{lang}}}|}}}}}}

—msh210℠ 19:22, 6 January 2009 (UTC)[reply]

Is this ever used as a preload template, or is it always used manually as {{subst:new alt|main spelling}}? If the former, then I'd rather not complicate it more than necessary, since editors will see the insanely complex wikitext and might be confused and/or deterred by it. If the latter, then sounds good! —Ruakh_TALK 20:52, 6 January 2009 (UTC)[reply]

I use it manually with subst. Not sure about others' use.—msh210℠ 17:12, 7 January 2009 (UTC)[reply]

Done.—msh210℠ 19:53, 12 January 2009 (UTC)[reply]

`{{suffixcat}}`

This template was added to all Hungarian words suffixed with xx categories and, as a result, the categories are no longer sorted. Should this template handle sorting? --Panda10 23:35, 17 November 2008 (UTC)[reply]

I've changed it to sort on the suffix for now, I suspect the * was there to try and sort all the Categories onto the first page of the "{{{language}}} suffixes" category, but as this didn't work I can't see why it should be there. (You'll have to wait an hour or so while it re-renders all the pages until you see any effect) Conrad.Irwin 09:37, 18 November 2008 (UTC)[reply]

Template:given name

Updating it to use the categories set up by Makaokalani, properly named with language names, not xx: codes (as they are not topic cats ;-). Also fixing a bug that got by somehow. We will be working on making a surname template in a similar manner. Robert Ullmann 14:28, 18 November 2008 (UTC)[reply]

Adding a reference for a new dictionary

For Chinese characters the references are specified by abbreviation=location, I wish to reference a dictionary not in the present list. How should I set up a new dictionary to refer to? Johnkn63 01:40, 19 November 2008 (UTC)[reply]

Basically, we would just set up a reference template, like those seen at Category:Reference templates. If you are unable to do so yourself, someone can probably do it for you. Basically, the info we'd need to do so is the name of the reference. If it is an online reference, we would also need the address. -Atelaes λάλει ἐμοί 08:15, 19 November 2008 (UTC)[reply]

Thank you for your suggestion, which was new to me. However looking at the link this is not quite what I am looking for. The template starts Han ref followed by a bar and then for example "kx=nnnn.mmm becomes" "Kangxi,page nnnn,character mmm" all inside doubled curly brackets. Johnkn63 13:44, 19 November 2008 (UTC)[reply]

Ah. There are two things you can do: create a new template, and just add it in entries following {{Han ref}}, or, tell me what the dictionary is so I can go figure it out. It is not a simple template, as you can see! The existing reference numbers are from the Unicode/ISO Han Unification "Four Dictionary" process. Robert Ullmann 16:16, 19 November 2008 (UTC)[reply]

Dear Robert, any help in making an template would be much appreciated. I of course recognized the dictionaries, the data itself may come from Unihan.txt . The four dictionaries mention have basically already been encoded. The dictionary, Sawndip Sawdenj has recently been cited at the IRG http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg31/IRGN1528R-FurtherDevidence.pdf . A reference like snsd page [3 digits] . entry [2 digits] and character [2 digits] would be one approach. So for 伝 this would be snsd492.0201 Sawndip Sawndenj Page 492 Entry 2 Character 1. I am not certain as to the best name to used for the dictionary, in Zhuang the name is Sawndip Sawdenj ( "The half-baked characters dictionary"), the Chinese title is <古壮字字典>("The Dictionary of Ancient Zhuang Characters"), there is no commonly used English title for the dictionary, using "Sawndip Sawdenj" would best to me, though I would welcome a second opinion.Johnkn63 01:18, 20 November 2008 (UTC)[reply]

Why don’t the nocap and nodot parameters of Template:nth s ind sim pres of work?

I tried several things, like =1, =true etc. Looking at the source, I see no reason why this would not work. Can someone have a look at this? H. (talk) 15:27, 19 November 2008 (UTC)[reply]

From what I recall, parameters can't be passed that way: nodot={{#if:{{{nodot|}}}|1}} might work better. -- Visviva 16:00, 19 November 2008 (UTC)[reply]

Yes, it is painful. The problem is that "nocap" and "nodot" are very poor design, they use a parser conditional instead of parameter defaults, and are (obviously!) backwards. (I should have fixed this the very first time I saw it; I had no idea it would spread so far.) Done properly, the obvious way to pass the parameter would work. See {{surname}} for an example of "dot=" set up correctly, you would then be able to just write |dot={{{dot|.}}}|...

None of which solves your problem! (sorry) So to pass the "nodot" parameter on a subcall, do this:

...|{{#ifeq:{{{nodot|+}}}|{{{nodot|-}}}||x}}nodot={{{nodot|}}}|...

(want to know how the magic works? if "nodot" is present, "1" or whatever, the #if resolves to (blank), and the parameter is nodot=1, if "nodot" is not present the two cases will default to "+" and "-", i.e. not equal, the #if resolves to "x", and the parameter is xnodot=1 which is ingored as the subtemplate has no "xnodot" parameter ... did you really want to know that? ;-)

In this particular case, given the way "nodot" is typically tested, Visviva's code should work, but that is relying on how the subtemplate tests the parameter. (Oh, your specific problem was that "nodot=1" was being treated as a named parameter to the #if function ... you might have written ...|{{#if:{{{nodot|}}}|2=nodot=1}}|... and just confused everyone except the MW parser ;-)

sigh Robert Ullmann 16:36, 19 November 2008 (UTC)[reply]

Hm, I think I understand what you write but am unsure where to go from here. What would you suggest, that I simply insert your example code above, or that I more thoroughly fix the templates as you describe in the first paragraph, and if yes, how exactly? H. (talk) 13:26, 15 December 2008 (UTC)[reply]

Personally, I'd suggest that you not worry about it. The template currently works, and I think Robert is wrong that this has to do with "the way 'nodot' is typically tested"; rather, it has to do with the meaning of nodot=, as used (I'm pretty sure) by all templates that currently have it. It's not an existence parameter, it's a truth-value parameter, and any template that tests for its existence is doing it wrong. He's right that {{{dot|.}}} would be better, but I think we're best off leaving it to him (and AF) to implement that change and handle the transition. (And {{{cap|F}}} is also a good idea, but is rather tricky, and wouldn't make sense with this template's current M.O.; |nth=1|cap=f would be silly at best. So again, we should leave it to Robert to handle that how he wants.) —Ruakh_TALK 15:20, 15 December 2008 (UTC)[reply]

Formatting Cyrillic

Note: this discussion was formerly at [[Wiktionary:Beer parlour]].

I know this has been discussed before, at least regarding non-Latin scripts in general, but I can't find that conversation.

Currently, adding a script parameter (e.g. sc=Cyrl) to most templates turns off the normal bold or italic formatting. This makes sense for many language scripts, in which this text weight or font style doesn't have meaning, or is not supported by common fonts. Also, some general style manuals (e.g. in OUP style) recommend rendering isolated words or short quotations in transliteration, or if necessary, in the foreign script but without italics—in either case they are sufficiently distinguishable from the surrounding English prose, and this may avoid confusion by exposing English-language readers to some unfamiliar cursive Cyrillic forms (e.g. г = г, д = д, т = т).

But bold and italic formatting are native to the modern Cyrillic script, and well supported in common fonts. Wiktionary is not occasionally using foreign terms in English prose, but has chosen to present all terms in their native form, and always supplement them with romanizations. Dumbing down the display of Cyrillic text in this context just reduces its integrity, and there's no reason for this compromise.

Inflection templates should display the headword in bold for sc=Cyrl (I've been omitting the script code for this reason, because there seems to be no problem in displaying Cyrillic). The worst case is that no bold Cyrillic font is present, so the word will be displayed in normal weight.
The term template should probably display Cyrillic terms in italics. (This doesn't apply to the Church Slavonic Cyrillic script, sc=Cyrs, because it is not influenced by Latin typography). —Michael Z. 2008-11-19 17:37 z

Objections? Technical hurdles? —Michael Z. 2008-11-19 17:37 z

As I see it, we have three basic kinds of formatting for, say, French words:

plain formatting (in lists, translations tables, etc.)
italics (mentions in etymologies, usage notes, etc.)
bold (headwords, and lemmata of form-of defs; I think that's it)

in addition, we allow users to customize some of this using CSS. (For example, my monobook.css has .use-with-mention { font-style: italic; }, so for me, {{form of}} senses appear entirely in italics.)

I don't think it's really necessary to distinguish plain formatting from italics for non-Latin-script words; the only reason we use italics for mentions is to set words off, and with non-Latin-script words, the script itself does that. However, if the editors that deal with a given script feel that we should italicize mentions in that script, I'm fine with that; so, it needs to be possible.

I do think we should distinguish plain formatting from bold; but a lot of scripts are a lot less readable in bold, so for those it would be better to use a larger font size instead of a heavier font weight.

So, I'd suggest that the various templates that support sc= indicate to the script template (using a standardized parameter, call it fmt= or something) what format they use by default: plain, italics, or bold. The script template can then handle that however it wants. And as long as we provide classes for user CSS to hook onto, we'll have all our bases covered.

—Ruakh_TALK 18:23, 19 November 2008 (UTC)[reply]

Yes, this has been mooted before. We should add "face=" (the proper term) to the script templates, to offer bold, ital, and a variant or two to the calling templates; this will also allow the fixing of the italics in the Latn template to be controlled by caller, etc. This discussion/section desperately needs to be moved to WT:GP. It is perfectly sortable. Robert Ullmann 23:29, 19 November 2008 (UTC)[reply]

Implementing this all on our own using templates could be complicated and inflexible. The HTML and CSS which make up our web pages already have a mechanism for specifying the structural role of an element, its language, and how it should be styled. We should leverage these standards in our templates, to make it easier for registered editors, readers with modern web browsers, or derivative projects to restyle dictionary entries.

For example, {{term}} writes out ... around the term, and it is made italic by a CSS declaration in a style sheet: .mention-Latn { font-style: italic; }. This is not bad, but could be improved in a few ways, to make it conform better to the specifications, and make it more flexible.

Instead of a meaningless span, the term could be enclosed in an i (italic) presentation element, which at least sets the default styling, or an em (emphasis) structural element.
Language and script should be properly indicated with a lang attribute, which can specify language, region, language script, and variant. E.g., ... for French, sr-Latn and sr-Cyrl for Serbian using either the Latin or Cyrillic alphabet.
The style sheet can select the styling for any element/class/language/region/script/variant combination, e.g., i.mention:lang|="Cyrl" { font-style: normal; } would make all terms in Cyrillic script un-italicized. They all get specified once in a central place, and are easily overridden.

Similarly, the headword could be placed in a dfn element (defined term) by its inflection-line template: <dfn class="headword" lang="cu-Cyrs">.... The style sheet could specify which languages' or language scripts' headwords get bolded. —Michael Z. 2008-11-20 07:42 z

Note that we put all that stuff inside the script templates. {{term}} doesn't use "mention-Latn", the {{Latn}} template does; it should be fixed to do that only if face=term (and then use "mention-term" or something, not particular to Latn ;-). As you say, we don't have to "implement that all on our own", but we do have to invoke it, and that is done within the script templates; hence we want a parameter to them to specify italic, bold, headword, mention etc. The script templates are a lot more accessible and understandable, and is best to keep the HTML tags out of templates like {{infl}}. The various templates like that should and would just call {{ {{{sc}}}|face=bold|(text) }} and the template either uses b or span as appropriate for the script, as well as other things. (You might be surprised.) At the same time, there aren't that many script templates. Repeating some boilerplate between them is no big deal. Hiding all the variants and CSS invocation within them is essential.

Keep in mind that Objective One is to make the presentation as good as possible with the bog-standard default settings of IE, Firefox, Safari, and Chrome. Also keep in mind that CSS is utterly intractable to most people, but templates are much more accessible, not to mention editable. We can always shift things into CSS as it is understood what is needed. (The existing set of script templates have been edited by dozens of people, of which I think only two grok CSS.) Robert Ullmann 12:33, 20 November 2008 (UTC)[reply]

Yikes! I see that .mention classes are sometimes added by {{term}}, sometimes by {{Latn}}, and sometimes not at all, and then the italic formatting is chosen in the style sheet MediaWiki:Common.css. This spaghetti isn't serving either template editors or CSS editors as well as a more modular system could. Is Cyrl the only exception, or is each script handled in its own special way? And where do formatting exceptions for individual languages or variants belong? I will try to untangle the details (is it centrally documented anywhere?), and see if I can propose some graceful streamlining. Of course accessibility for editors is important, but the efforts of editors should best serve the readers and re-users of the dictionary. —Michael Z. 2008-11-20 17:21 z

I modified {{infl}} and {{Cyrl}}, the first to use face=head when calling the script templates, and the latter to select tag ([span, i, b]) based on the face ([head, term, ital, bold]) and that is as simple as it gets. (never mind the ancient "RU" class ;-) Other script templates alter font sizes and such; they can now be optimized. (Arabic in particular.) Robert Ullmann 12:51, 20 November 2008 (UTC)[reply]

Er, could you explain those four names? I suppose I get the difference between "head" and "bold" (I imagine that few if any script templates will distinguish them, but user CSS might), but aren't "term" and "ital" exactly the same thing? And what about {{onym}}, {{projectlink}}, {{t}}, etc.? (Or would they omit the parameter entirely?) —Ruakh_TALK 17:29, 20 November 2008 (UTC)[reply]

A number of script templates will distiguish head and bold: for example Hant would want to do size 125% (or so) for head, and nothing for bold. "term" and "ital" are not the same thing: there was a vote to make {term} display Latin (specifically) in italics, specifically customizable separately from any other italics. (Whether "ital" has any other use might be debatable ;-) Other things (like your examples, but don't the project links use bold?) will just omit the parameter, and get normal size, face, and weight; but by "normal" we mean with the adjustments we make to make various scripts work well mixed with Latin script in running text: Arabic is increased in size, as the default is tiny, and others (these are in the .XX classes now) Robert Ullmann 19:37, 20 November 2008 (UTC)[reply]

See горілка, which you were trying to fix. Note that the font selected by RU (the class, not me!) appears with a heavier weight bold in FF at least). Robert Ullmann 12:56, 20 November 2008 (UTC)[reply]

Weird: yesterday горілка's headword was in normal-weight font, before I removed sc=Cyrl. Now it's bold, but with the ugly, ugly Arial font applied instead of the default—(perhaps it was before, but less noticeably without boldface). It looks bad, especially in contrast to the bold plural link in the same line, which doesn't have this font applied. Something is out of whack here, and I'm not sure what is the best way to synchronize these things. Is there any reason not to apply the Arial font only for MSIE, as is done in Wikipedia's font templates? —Michael Z. 2008-11-20 17:21 z

It was applied before with sc=Cyrl, class = "RU", but, as you note, it isn't that noticeable without being bolded. I think it used to be hidden from all but MSIE, but that was changed? Robert Ullmann 19:37, 20 November 2008 (UTC)[reply]

I see it was removed March 16. I don't really understand the edit summary “remove inherit hack for .RU to allow identical inline font-family to be removed from Template:Cyrl cleanly”).

The editor who changed it is inactive for the last month. Any objection to restoring the hack? This would let all browsers display Cyrillic with their default mechanism, but continue to help out MSIE, which is a bit slow in the language scripts department. Of course it can still be overridden by any reader who chooses to, with a Wiktionary editor style sheet, or web-browser user style sheet. —Michael Z. 2008-11-20 20:20 z

I see that the ugly display is because my Arial Unicode MS doesn't have a bold font, so my system is creating an artificial one. Dunno why it didn't show up that way before. The plain Arial also looks a bit different than the default font in both Safari/Mac and FF/Mac, so it shouldn't be forced on all readers. —Michael Z. 2008-11-21 00:10 z

I am restoring the inherit workaround for MSIE to class .RU, in the style sheet MediaWiki:Common.css. The Cyrillic font list will only continue to be applied in MSIE, and modern browsers can display Cyrillic by their native inheritance mechanism. —Michael Z. 2008-11-25 16:09 z

Uh-oh. I just found the previous relevant discussions. It seems the reason Cyrillic font was being applied in all browsers was because acute accents over Cyrillic letters were broken in some browsers.

The workaround, which I've been applying anyway, is to apply the acute accent indicating stress in transliterations, e.g., ковбаса (kovbasá).

Is this still a problem? Are there any other situations which require forced Cyrillic fonts? —Michael Z. 2008-11-25 21:50 z

DerbethBot again

DerbethBot (talk • contribs) is at it again, this time adding plural pronunciations to the entries for the singular.

This bot operator is just too careless, I'd support a de-botting. Robert Ullmann 15:50, 20 November 2008 (UTC)[reply]

The audio pronunciations are also seemingly being placed randomly with regard to existing elements of the pronunciation sections. Is there a reason why Connel's bot (that adds audio files correctly) isn't doing this work?

anyway, I support a debotting until the operator shows they can get it right - perhaps even demonstrating this before every bot run will be necessary. Thryduulf 16:01, 20 November 2008 (UTC)[reply]

It was Dvortybot I think you are recalling, not a Connel bot. And it did add the audio after the IPA line, and so forth, basically correctly. Robert Ullmann 16:20, 20 November 2008 (UTC)[reply]

Oh great, is it a common practise on English Wiktionary to block people who you don't agree with so that they don't write their opinion? For me Robert Ullmann either wanted to shut my mouth not to be disturbed in writing rubbish about me or he is to dumb to use block page properly. In case you don't know, blocking a bot for a day without disabling "autoblock" option causes his operator to be unable to do anything; this includes posting an explanation or reverting bot changes.

Bloddy hell, I have added about 12,000 audio files to English Wiktionary and you still treat me no different than Willy On Wheels, giving me no chance for even explaining myself or correcting my mistake! Do I deserve a one-day block without any attempt to contact me and ask what my bot is doing?

My bot has been adding audio files for plural forms because this is the only way audio files should be added on German or Polish Wiktionary, so I assumed in good faith that it should be also done on English Wiktionary - and devoted my time to implement and test it especially for your project. Can you point me where in WT:ELE#Pronunciation can be found, that plural audios should not be added to single form entries?

You write that pronunciation files are placed randomly - someone has to be blind here, because for me it seems that they're added at the beginning of the pronunciation section. Perhaps you were looking at contributions of another bot, huh? With nearly 12,000 pages edited by my bot since November 2007, noone has written to me with any complaint that placement of added audio files should be different - so I assumed that averything is all right. I do not edit English Wiktionary, I'm not up to date with all its rules, but if you has requested change in bot behaviour, I would have performed it.

I have turned off adding of plural audio files on en.wiktionary and changed placement of new audio files. If you want, I can demonstrate that the bot is working right. You want me to prove that the bot is functioning right every time it begins work - as you wish, this would even help me, provided that someone competent would be able to review the bot's job and talk with me in a civilised way.

If you are going to use another bot for adding audio files - great, I wonder how well it will cope with utilizing new 16,000 Ukrainian and 8,000 Belarussian files (hey, and guess who has uploaded them). On German and Polish Wiktionary bot finished the whole Category:Pronunciation on Commons within four days. --Derbeth ^talk 18:16, 21 November 2008 (UTC)[reply]

I know nothing about how things are done on other Wiktionaries, but on this Wiktionary bot owners are required to demonstrate that their bot is running correctly before being left to run automatically. This is normally the case before each new task, but not for additional runs of the same task. This is not a hard and fast rule though and is adpated to fit the circumstances of the bot, task and author. The acceptance is normally given by other bot authors (of which I am not one) who understand what if required.

Regarding the placement of the audio files, our policy and style guides (have you looked at them?) make it clear that only content relates to the word being defined should be part of that entry. Things like pronunciation of plurals therefore belong on the entry for the plural form. The structure of the pronunciation section has changed a little recently, and our style guides are inconsistent (something I have just noticed) but the accepted format is

; Part of speech 1
* {{a|region 1}} {{enPR|enPR pronunciation}}, {{IPA|/IPA pronunciation/}}, {{SAMPA|/SAMPA pronunciation/}}
*: {{audio|audio file.ogg|Audio (region 1)}}
*: {{rhymes|rhyme}}
* {{a|region 2}} {{enPR|enPR pronunciation}}, {{IPA|/IPA pronunciation/}}, {{SAMPA|/SAMPA pronunciation/}}
*: {{audio|audio file.ogg|Audio (region 2)}}
*: {{rhymes|rhyme}}

; Part of speech 2
* {{a|region 1|region 2}} {{enPR|enPR pronunciation}}, {{IPA|/IPA pronunciation/}}, {{SAMPA|/SAMPA pronunciation/}}
*: {{audio|audio file.ogg|Audio (region 1)}}
*: {{audio|audio file.ogg|Audio (region 2)}}
*: {{rhymes|rhyme}}

Obviously not all elements will be present in all cases, but this should show you where the audio file should be placed. My comment about random positioning was that looking at a sample of the bot's contributions, there appeared to be no apparent consistency with where the audio file was placed with respect to pre-existing parts of the pronunciation section. If you had followed the old (documented) standard then all of the audio files would be placed at the end of the pronunciation section. It's true that a large number of French entries have the audio before the IPA, but this is incorrect.

I appreciate that you have put a lot of work into this, and we appreciate the content, but the block (as I understand it, of your bot not you) was to prevent it causing us more work sorting out the errors it has made. For example it is not unlikely that every contribution made will have to be reviewed by a human and the audio pronunciation moved to the correct entry/place in the entry by hand.

The previous bot I referred to has in the past, added a very significant number of audio files correctly, so I was surprised that it wasn't being used to do the job again. Thryduulf 18:55, 21 November 2008 (UTC)[reply]

The additions were also, in some cases, incorrect no matter where they were placed. Just because an audio file exists does not mean that the content of that audio file is correct. At least a couple of the additions I saw were items that had deliberately not been linked because the audio file content was wrong. Thus, you were not filtering the content of additions, but merely linking files based on their name alone. This is never a good idea. --EncycloPetey 19:43, 25 November 2008 (UTC)[reply]

If the content of an audio file is wrong the right thing to do is to request its deletion from Commons. While it is the responsibility of the bot owner (Derbeth, in this case) to make sure that his bot respects all applicable policies (rather than simply assuming analogy to other wikts) it cannot be expected that he checks each individual file. Under that condition, adding audio files wouldn't be a task for a bot any longer. -- Gauss 23:54, 25 November 2008 (UTC)[reply]

That's at variance with Commons policy. They wouldn't delete a file simply for erroneous content. Getting anything accomplished there is like pulling teeth. And I agree with you that adding links to audio files is not appropriate for blind bot automation. --EncycloPetey 00:06, 26 November 2008 (UTC)[reply]

You misunderstood me. I think that DerbethBot is in principle useful – under the condition that it follows all applicable formatting policies (and in response to Derbeth: WT:ELE#Pronunciation says: Ideally, every entry should have a pronunciation section..., which wouldn't make any sense if plural pronunciations were to be placed in the entry for the singular form). Also, DerbethBot was unanimously approved (well before my time) and in the relevant Beer parlour discussion, no objections were raised that would imply a screening of the audio files by the Derbeth. I am not too familiar with the procedures at Commons but I don't quite see why the wik* deletion mechanisms shouldn't work there. Aren't there enough knowledgable people who say 'this is wrong' when something is wrong? In the only case I followed, the deletion of a bad audio file did happen without any problems (true, it took 6 weeks, but our own RFD also has quite a few items even older than that). -- Gauss 00:28, 26 November 2008 (UTC)[reply]

Thanks for the comments. First of all - the issue with audio files deliberately not used in entries. It is not possible for an automatic program to judge whether given file is or is not welcome in an entry; if someone removed an audio from entry, but not the audio itself, the bot will insert this audio file again. In case audio file on Commons has unacceptable quality, it should be deleted from Commons. On the other hand, if you decide that adding audio files should not be done automatically, I can generate lists of audio files waiting to be added to entries so that you resolve this task manually. I mean something like [1].

As for the format of pronunciation section - how about entries that have the old section layout or are not in English? For example, let's take woordenboek. My bot has added an audio file there in April. How should this entry look according to current rules? --Derbeth ^talk 00:06, 27 November 2008 (UTC)[reply]

The only issue I see with (deprecated template usage) woordenboek is that the IPA should precede the audio file link. We usually have the phonetic transcription first, then the audio file, then the rhymes link, then any homophones, and hyphenation last (when present). When there is more than one regional pronunciation, this structure is repeated for each pronunciation, with items relating to the same region grouped together. At least that's the current ideal; there are many entries that don't live up to that standard yet. It would also be nice for the audio file to note the dialect of the speaker, since Dutch does have notable regional variation in pronunciation. That information isn't always available, but ought to be included when it is possible to do so. --EncycloPetey 00:40, 27 November 2008 (UTC)[reply]

I have a proposal: I'll run my bot on languages other than English and won't touch entries using template {{a}}. Making my bot aware of new rules would require too much work for me at the moment; as for me they are not very well suited for automatic programs. I've already made a fix to placement of audio files in entries using old layout of pronunciation section. If you agree, I will perform a small number of test edits in several languages to demonstrate that everything is all right. --Derbeth ^talk 14:45, 28 November 2008 (UTC)[reply]

If so, would it be possible to make lists of which Swedish entries you add audio to? In many cases there are homographs in Swedish which are not yet split into separate etymology headers, let alone pronunciation headers - so your bot added a pronunciation which really only is valid for one of the senses or one of the PoSes. I have been around trying to add (empty) etymology sections for those words so that I could move the pronunciation section the right place, but manually looking through a list of contributions of a bot to find which entries are Swedish and thence possible to be corrected by me, quickly get pretty tiresome. \Mike 17:27, 28 November 2008 (UTC)[reply]

It also wouldn't be worthwhile to add Latin audio files. I have personally created almost all of those, and they're already linked. Most of the Latin audio files that aren't linked are deliberately not linked, either because the pronunciation is skewed or because the audio quality is inferior. Some files are too faint or too staticky to be useful. --EncycloPetey 17:32, 28 November 2008 (UTC)[reply]

Ok: last edits, User:DerbethBot/Audio files/sv. --Derbeth ^talk 17:09, 30 November 2008 (UTC)[reply]

Looks good to me, as far as I can tell. It even adds the blank behind the * now ... Regarding your question on old entries above (see (deprecated template usage) woordenboek), the order within the Pronunciation sections doesn't seem to be a top priority: I asked our best nitpicker if he really minds, and he didn't. (Of course doesn't mean that new additions may freely ignore this consolidation of formatting policy that took place in the meantime.) -- Gauss 18:21, 30 November 2008 (UTC)[reply]

For the Hungarian entries: I was instructed by User:EncycloPetey to put the audio line after IPA but before hyphenation. Also, I always use capital for the Audio label that is displayed. Could you make these changes? --Panda10 18:47, 30 November 2008 (UTC)[reply]

Remark: It is not made explicit in the policy documents, neither in WT:ELE nor in Wiktionary:Pronunciation that the lines labelled as "additional items" must go at the end. It has not been mentioned earlier in the present discussion here. Of course it makes perfectly sense. -- Gauss 22:52, 30 November 2008 (UTC)[reply]

Ok, I've added these features. Latest bot edits: [2]. Any other suggestions before I run the bot in normal way? --Derbeth ^talk 11:36, 3 December 2008 (UTC)[reply]

Contrary to my last remark, the order within the pronunciation section ~~is actually~~ seems to be specified by the layout policy, see WT:ELE#Additional_headings. It says (relevant lines only)

===Pronunciation===
*Hyphenation
*Rhymes
*Homophones
*Audio files in any relevant dialects

In particular, the advice Panda received from EncycloPetey ~~violates~~ is in contradiction to this ~~policy~~ paragraph (the advice that audio files should go at the end of the pronunciation section). Derbeth, sorry for the confusion. As the order within the pronunciation sections does not seem to be enforced by AutoFormat this is probably a minor issue, though. -- Gauss 20:52, 4 December 2008 (UTC)[reply]

You know perfectly well, Gauss, that the "policy" you refer to does not state any such order. You are referring to an antiquated example in the ELE that has not been kept up to current standards (as is true for many parts of the ELE). No policy has been violated. If we are going to stick to the "policy" that you've noted, then you'll need to revert your recent changes to {{pl-noun}}. The example you've cited clearly shows that the line following the Noun header is for Declension, which is information you removed from that template. You can't claim that the example on ELE is policy, accuse others of violating it, then ignore it yourself. Either you've violated policy, or I'm right about this being an antiquated example. --EncycloPetey 20:59, 4 December 2008 (UTC)[reply]

We're getting off-topic here but the way how I effectively created (I've explained you before why the use of the word change is misleading here) {{pl-noun}} is clearly covered by Some languages do have characteristics that require variation from the standard format in the section "Variations for languages other than English". Generally, if a policy has changed, WT:ELE should be changed. In the absence of changes to ELE, its unambiguous statements unlike the one we're currently concerned with should be deemed authoritative, in particular as we refer all casual editors there. I have not accused you of a deliberate violation, I have stated that your advice to Panda is incompatible with the policy outlined at ELE. I'm sure this didn't happen out of ill will. To the "either ... or": There are almost never, except in mathematics, only two possibilities. -- Gauss 21:07, 4 December 2008 (UTC)[reply]

You are making an inference from an example used in the ELE. If you care to do that, then please note the phonetic transcription is not included in the example at all! So, by the same kind of inference you are making, the pronunciation section should not include phoineitc transcriptions. Also, if you care to look at Wiktionary:Pronunciation, a different inference can be made from this draft policy/guideline page. There, rhymes, hyphenation, and homophones are listed as "additional items", implying that they should follow the items described as part of section layout. Again, nothing here is incompatible, since the example you keep harping about is preceded with the text "An order for these headings is recommended, but variations in that order are also allowable. A typical article that uses many of these additional headings could be formatted thus:" There are no "unambiguous statements" about this issue as you claim. In other words, this example mandates no order. I realize you hate to change your opinions, and are not amenable to admitting error on your part, so please just let this drop. --EncycloPetey 21:50, 4 December 2008 (UTC)[reply]

On the contrary, it is quite easy to change my opinions with reasonable arguments, and I have stated on WT expressly several times that I was wrong. In fact, you have a point in this discussion. The point is that different interpretations are possible and that no rule is put forward where such a rule would be expected to be found (i.e., in the Pronunciation section of ELE or at WT:Pronunciation). The second half is old news, I've written that myself a few paragraphs above (22:52, 30 November 2008). The paragraph we are discussing tonight is not called Example, it is presented with the words "a typical article ... could be formatted as follows thus". Of course, could is different from should but no other permissible way is suggested; instead, the order of headings in this template (template here not in the sense of the template namespace) is repeated further down on the page as rule. You misunderstood my remark containing the words "unambiguous statements"; it was made in the context of this discussion. I hope I've improved the wording now (text in green). To the matter itself, of course I find it good to have IPA and audio next to each other; I've stated that (implicitly) already above (last sentence of 22:52, 30 November 2008). I would even propose to move Homophones, Rhymes etc. to a later section, below the definitions, but I don't see any chance to get that realised, so I don't propose it. -- Gauss 22:45, 4 December 2008 (UTC)[reply]

I have started a vote, that hopefully will reduce some of the confusion created by the ELE example. The vote doesn't propose to change "policy", but it would revise the contentious example. --EncycloPetey 22:52, 4 December 2008 (UTC)[reply]

It might be a better idea to combine this (presumably uncontroversial) vote with a vote on (presumably just as uncontroversial) aspects and inconsistencies observed in Wiktionary_talk:Entry_layout_explained#Inconsistencies_between_ELE_and_Templates. Should we move tonight's additions in this section to the Beer Parlour? -- Gauss 23:04, 4 December 2008 (UTC)[reply]

I don't think this particular discussion would clarify the issue any, and would bring in unrelated issues to the discussion. I have started a BP thread (same title as the vote, and linked from it) where the fact confusion exists is noted. That, I think, is the key point, and there are plenty of people I've spoken to recently who agree that the example is confusing, misleading, or strongly biasing. If you would like to weigh in on that discussion with directly pertinent comments, I think that would be more useful than copying this thread there.

The template inconsistencies you note involve a lot more work. Personally, I would want to see a list of specific wording changes before having a vote, just to be sure there isn't addiitonal "discussion" after the vote. I may have time to assist with writign such a proposal, if you're offering to write one. Two people working together tend to produce a more polished and well thought out product than one person working alone. --EncycloPetey 23:15, 4 December 2008 (UTC)[reply]

Ok, so should I change the bot's behaviour? --Derbeth ^talk 23:31, 7 December 2008 (UTC)[reply]

Not from the state of 3 December. Right, everyone? -- Gauss 23:19, 8 December 2008 (UTC)[reply]

Move inline fonts from script template to common.css

{{Cyrs}} (Old Cyrillic alphabet) has an inline font specification (style="font-family: BukyVede, ...; "). This is difficult to override in a user style sheet.

I'd like to add the following to MediaWiki:Common.css, in the section “Support for script templates”:

   /* Old Cyrillic; see Appendix:Old Cyrillic alphabet */
   .Cyrs { 
       font-family: BukyVede, Kliment Std, RomanCyrillic Std, Menaion, Menaion Medieval, Lazov, Dilyana, Code2000, DejaVu Sans, Lucida Grande, Arial Unicode MS, Lucida Sans Unicode;
       }

(I've also found that font-size: 1.37em; improves the display.)

The spec should be removed from the template when the cache expires (is that still 30 days?). Are there any other script templates which should get the same treatment? —Michael Z. 2008-11-20 18:36 z

Yes, we should put this in common.css. I'm not aware of any other particular script using explicit fonts, but would not be surprised. Robert Ullmann 19:55, 20 November 2008 (UTC)[reply]

[oops—confused by edit conflict] Note that there's a problem with using inline styles in attributes: they are more more specific than declarations in the head or in linked style sheets. They can be overridden for contained elements, but not for the elements they are inline in. In this case a selector like .Cyrs or span.Cyrs will not override the template. span.Cyrs a will override it for a contained link, but not for all text in the span. —Michael Z. 2008-11-20 20:01 z

Sorry, I removed my comment after a few minutes because I hadn't gotten it right; and I didn't have time then to fix it. Yes, you are quite correct that if the class and the style are in the same element, the specificity of the style element will override the class. What works is doing it like this:

<span style="font-family: BukyVede, Kliment Std, ...; "><span class="Cyrs">{{{1}}}</span></span>

(see {{Cyrs}}) so that the class will apply to the content of the inner span. (So should work now.) User CSS will work as usual, except that "inherit" isn't useful. The outer span can then be removed in 30 days when everyone has it in common.css. Robert Ullmann 15:11, 21 November 2008 (UTC)[reply]

Thanks, I hadn't thought of that. An improvement, anyway, and I'll build it into the script templates as I refurbish them. As a rule, stable style info should get migrated into the style sheet. HTML recommends “For optimal flexibility, authors should define styles in external style sheets.”[3] —Michael Z. 2008-11-21 22:36 z

I added the font spec. Any objections to setting the font size to 1.37em? —Michael Z. 2008-11-20 20:09 z

Survey of script templates

I scanned through category:Script templates. There's a lot of variation. Some use an HTML span, others font. A few apply lang and xml:lang attributes. Some seem to be placeholders which just echo the input.

55 of them have inline styles for fonts (often without a fallback), sizes (specified in various relative or absolute sizes), or both. Some use dir="rtl"; can this be replaced with CSS? Some insert the list {{unicode fonts}} for all browsers, others apply class=Unicode, which applies a different set of fonts, for MSIE only.

Yeesh. I'd like to clean this up.

What should the normal script template look like? I assume it should be a span, with lang and xml:lang attributes. Can all the font lists be safely copied to the style sheet, or at least ones that have been stable for a month?

Shall I make a page with a detailed survey for review, or can I just get to work? —Michael Z. 2008-11-20 20:52 z

My thoughts:

Yes, standardization is good, though some variation is to be expected, since the whole point is to compensate for variation in browser and font support between the various scripts.
Prefer over .
Don't use lang= and xml:lang=; these templates don't correspond to specific languages.
Always provide a class=. The name of the script template itself (Cyrl, Hebr, Hant, etc.) is usually a good class-name.
After copying inline styles to the central style sheet, wait at least a month before removing them from the script template.
Font lists (etc.) are not just for IE6. Just because a browser is capable of font substitution, that doesn't mean it always makes readable choices. (Depending on the script, anyway. Cyrillic and monotonic Greek are probably fine; Arabic and m'nukedet Hebrew are probably not.)
Sizes should all be relative, using either em-sizes or percentages. For non-head, non-bold, they should be at least 1em/100% (these templates shouldn't be making text smaller) and at most 1.5em/150% (more than that, and I suspect the editor making decisions for this script needs either new glasses or a different screen resolution).
dir="rtl" is desirable for right-to-left scripts. (It's true that CSS supports text-direction, and that browsers' BDO algorithms cover the common cases anyway, but if it's a right-to-left script, then why not have dir="rtl"? It's not like users are going to want to override that.)
Scripts should either ignore italics, or implement them as italics, with the former being preferred for most scripts. Other sorts of mapping (e.g. hiragana→katakana, which is roughly the kana equivalent of roman→italic) are inappropriate for an English dictionary.
Scripts should implement bold (face=bold and face=head) either as bold, or as a standard increase in font-size: maybe 1.25em/125%? (Relative to that script's non-bold font-size, I mean.)
Any actual changes in functionality (rather than in implementation of the same functionality) should be discussed at the talk-page for the script template, with links from each affected language's [[Wiktionary talk:About ____]].

Do you agree/disagree? (Did I cover everything?)

—Ruakh_TALK 23:19, 20 November 2008 (UTC)[reply]

Roger, ack.

Do any of the following want to have lang and xml:lang attributes: {{hebrew}}, {{yi-Hebr}}, {{ks-Arab}}, {{ku-Arab}}, {{ota-Arab}}, {{ps-Arab}}, {{fa-Arab}}, {{pa-Arab}}, {{sd-Arab}}, {{ur-Arab}}, {{ug-Arab}}?

Many of the existing class names appear to be derived from languages (e.g., .AR for Arabic, .BN for Bengali, etc.). I will prepare for a transition to standardized class names based on scripts by adding a redundant class to the style sheet for now (e.g., .Arab, .Beng). After a month, I'll update the templates to write the new class names into entries. —Michael Z. 2008-11-21 00:54 z

Template:lang2sc might be useful in this discussion. Conrad.Irwin 00:58, 21 November 2008 (UTC)[reply]

stop/slow down on that: as set up, it is a performance disaster. If added in the "obvious" way it will add large un-needed overheads to hundreds of thousand of pages on which it will do nothing at all! Remember, the template language is not procedural, it always expands eveything. If you reference {{lang2sc}} from {{infl}}, it will be fully expanded (and more than once) on every single call. So don't get excited about it, it simply can't be done that way. (sorry) Robert Ullmann 14:45, 21 November 2008 (UTC)[reply]

I was looking at that! Can its function be rolled into templates like term, infl, etc., to automatically add class=".Latn", etc? Ideally, one would never need to add “sc=Xxxx” to a template, although it would be nice to be able to override the default.

I'll go through all of the class names used and make sure there are no collisions. —Michael Z. 2008-11-21 01:49 z

It certainly should be, though we should probably do a quick glance through the dump to ensure that this is actually the case (I can probably do so in the next few days). Conrad.Irwin 02:06, 21 November 2008 (UTC)[reply]

You seem to be suggesting two separate things at once: (1) using {{lang2sc}} with lang= instead of an explicit sc=, at least for single-script languages; (2) using standard class-names instead of script templates (foo instead of {{Latn|foo}}). Do I have that right? If so, I think both are good ideas (for the long term, anyway — it will definitely be a while before #2 is possible, if only due to all the CSS caching, and I'm not sure whether {{lang2sc}} is fully ready for prime-time yet, either). —Ruakh_TALK 02:58, 21 November 2008 (UTC)[reply]

The script templates will still be needed, even if little more than a span. For one thing, they are often used by themsleves in entries (surely you aren't suggesting someone put the HTML span/class etc in the entry text? ;-). Also they keep the invoking templates from being a complete mess. And then there is the face/context issue that started this, that could be done in CSS, with a very great deal of pain for someone trying to add things, and would also require that all calling templates use yet more HTML instead of wikitext. (And do please try to keep in mind maintaining the MW objective of permitting wikitext to be translated to something other than HTML. Is often forgotten that HTML is only the target for the web site interface to the data! Even for print, it isn't very good. If someone is formatting wikitext to PDF, etc, {{lang|ru|face=bold|...}} is a lot better than a bunch of embedded HTML in various combinations, interpretable only by reverse engineering the style-sheets.) Robert Ullmann 14:59, 21 November 2008 (UTC)[reply]

Re: used by themselves in entries: Oh, definitely, I agree. I just meant {{term}}, {{infl}}, etc. using foo instead of {{Latn|foo}}. (Unlike you and Connel, I really don't think HTML in entries is the end of the world, but it's obviously not ideal, and I'd never advocate it without a reason.) —Ruakh_TALK 15:08, 21 November 2008 (UTC)[reply]

lang2sc is a separate issue. For right now, all I'm proposing is standardizing the structure and output of the script templates like {{Latn|foo}}. I'll post a summary below before I start any major activity. —Michael Z. 2008-11-21 16:40 z

Re: {{yi-Hebr}} etc.: I don't see a problem with including lang/xml:lang for those. (Theoretically I suppose yi-Hebr could be used for any word in the Yiddish alphabet, but as a practical matter, I don't think that happens.) —Ruakh_TALK 02:58, 21 November 2008 (UTC)[reply]

I would like to see class="sc Arab" so that I can easily turn off these irritating things by adding .sc { font-face: inherit !important } to my monobook (and can add a similar rule to WT:PREFS). For OS/browser combinations that are built correctly, these almost invariably produce results worse than the defaults. (feel free to change the classname that is used, as long as the same one is used on all of the templates) Conrad.Irwin 01:09, 21 November 2008 (UTC)[reply]

I'll figure out the details—it may look like class="sc-Arab", so [class|="Arab"] would select any Arabic script (unsupported in MSIE, I think).

Once it's in place and you have refreshed the style sheet, you won't need to use “!important”. —Michael Z. 2008-11-21 01:49 z

I'd much rather a seperate class name be used on all the templates. There would be little difference at the expense of being able to use much more widely understood (if only slightly more widely implemented) selectors. (i.e ".sc" instead of "[class^=sc-]") Conrad.Irwin 02:04, 21 November 2008 (UTC)[reply]

Actually, I guess that would help MSIE compatibility too. I'll double-check that appropriate two-letter class names are free. If there's a problem, then an option would be to add class names like .lang-sc, etc. —Michael Z. 2008-11-21 02:46 z

Script templates to merge or delete

I've started cleaning up the script, transliteration, and pronunciation templates, and their corresponding styles in MediaWiki:Common.css. Templates are catalogued at User:Mzajac/Script template cleanup.

There are a few which might ought to be merged or eliminated:

{{IPAAusE}} should probably be replaced by AutoFormat with {{a|AusE}} {{IPA}} (why is that not “en-AU”?)
{{IPA Rhymes}} seems redundant
{{ipac}} appears to belong to an abandoned project
{{Lchar}} should probably be deleted now
{{Hang}} and {{Kore}} appear to be redundant
{{Hebr}}, {{hebrew}}, and {{yi-Hebr}} appear to be redundant

—Michael Z. 2008-11-24 06:55 z

Is there an index to script templates anywhere? For example I know that {{IPA Rhymes}} is redundant to {{Rhymes}}, but what are the other ones you mark as redundant redundant to? It took a lot of effort (about three Wikipedia articles and looking at Category:Script templates) the other day to work out that {{Xsux}} is what I needed for Cuneiform. A page listing the name of the script and which script template to use for it would be helpful. Thryduulf 22:32, 24 November 2008 (UTC)[reply]

(that's why we have {{Cuneiform}} ... which is then automatically replaced with Xsux, not the most obvious name ;-) Robert Ullmann 14:42, 25 November 2008 (UTC)[reply]

Well, it looks like IPA Rhymes is the same as IPA + Rhymes, so what's the point? as far as I can tell, Hang and Kore are practically identical, but I don't know if they're reserved for slightly different uses and there's a potential divergence in the future. Same for the Hebrew templates, but I don't know if Yiddish has different typographical requirements and maybe is waiting for a better Yiddish Hebrew font to be released. Also, {{Grek}} and {{polytonic}} only differ by a single font, and perhaps they could be merged. Summary is at User:Mzajac/Script template cleanup.

Shoot {IPA Rhymes}, not needed, and not used anywhere. Probably forgotten, except for this discussion. Same with {IPAAusE}, created by the same user, and not used anywhere. (no need for AF to do anything) Robert Ullmann 14:42, 25 November 2008 (UTC)[reply]

I'd like it all to be consistent and self-documenting. I've set the templates up for docs on their talk pages, and I will add a basic description to each one. A template index might be nice, but may get out of synch with reality. Maybe better to point to them from each “About language” page, each relevant guideline (WT:ELE and children), and from each practical template's documentation ( {{term}}, {{infl}}, {{t}}, etc).

I usually start at Wikipedia articles about languages in scripts, which list the respective ISO codes in their infoboxes. I've also found useful w:List of ISO 15924 codes (scripts), w:List of ISO 639-1 codes, and w:List of ISO 639-2 codes (languages). —Michael Z. 2008-11-24 23:03 z

I think {{Kore}} and {{Hang}} should both be available for use, seeing as they're both actual ISO script codes, but I think it should be fine for {{Hang}} to redirect to (deprecated template usage) Kore (since it's a subset of it). As for the others, sounds good to me. —Ruakh_TALK 01:24, 25 November 2008 (UTC)[reply]

That will be okay if we can ensure that some well-meaning person won't start "snapping" the redirects. I'd be happier if they stayed separate, even if redundant. (similarly with the 3 Japanese codes). {hebrew} is redundant, {yi-Hebr} exists because of someone who kept trying to import templates from yi.wikt, with their naming conventions. Robert Ullmann 14:42, 25 November 2008 (UTC)[reply]

Okay: I'll do the following, if there's no objection:

~~{{IPAAusE}}: replace by {{a|AusE}} {{IPA}} and delete (assuming that it is not used on an ongoing basis to transwiki from Wikipedia)~~ [replaced and RFD'd —Michael Z. 2008-11-25 19:50 z]
~~{{IPA Rhymes}}:rfd and delete~~ [RFD'd —Michael Z. 2008-11-25 19:50 z]
{{ipac}} rfd and delete [needs a better look]
~~{{Lchar}} delete immediately~~ [removed and deleted —Michael Z. 2008-11-25 19:50 z]
{{Hang}} and {{Kore}}: I didn't realize these had different script definitions. They can remain separate, but share the same font specification in the style sheet
{{Hebr}}, ~~{{hebrew}}, and {{yi-Hebr}}: merge~~ [redirected and RFD'd —Michael Z. 2008-11-26 01:43 z]

Three Japanese codes? I only know of {{Hrkt}} which redirects to {{Jpan}}, plus {{JAruby}}—is that all? —Michael Z. 2008-11-25 15:54 z

15924 defines {{Hani}} (for Han/Kanji), {{Hira}}, {{Kana}}, {{Hrkt}}, and {{Jpan}} of which as you can see we have 3. There probably isn't any use for Hira/Kana given Hrkt. Robert Ullmann 11:42, 26 November 2008 (UTC)[reply]

Thanks. {{Hani}} currently sets lang="zh" in the HTML. Is this correct, or do we also use it for Japanese and Korean? —Michael Z. 2008-11-26 17:52 z

It isn't "correct", but since Japanese can use Jpan, and Korean either Kore or Hant, and Vietnamese Hant (Korean and Vietnamese are always traditional characters) it probably is a reasonable default. We probably should have a {{ko-Hant}} to set the language for Korean, as the characters often appear differently in the Korean-specific fonts. A number of the POS templates for these have their own explicit code now. Robert Ullmann 10:30, 27 November 2008 (UTC)[reply]

Greek font lists

Can we also merge the font lists (but not the templates) for {{Grek}} and {{polytonic}}? This would merely add DejaVu Sans to the front of the list for Grek, making the two display consistently on any machine. —Michael Z. 2008-11-25 15:54 z

That was previously done and reverted, people prefer ancient greek to be serif, while leaving modern greek sans-serif. Conrad.Irwin 19:49, 25 November 2008 (UTC)[reply]

But the only change would be to add a sans-serif font (DejaVu Sans) as first choice for modern Greek, which would make the lists identical. This would only affect the display of modern Greek for MSIE users who have DejaVu Sans installed. There would be no change for ancient Greek at all. And the classes would remain, so readers' style-sheet overrides wouldn't be affected either.

(By the way, on my vanilla MSIE/WinXP test system there is no difference in the display of Greek in the two templates. And I'm no expert, but I believe that serifs were a Latin innovation which was later applied to Greek capitals—perhaps you are comparing a thick-and-thin font to one with uniform (“monoline”) strokes.) —Michael Z. 2008-11-25 20:45 z

Ok, maybe things have changed. see User_talk:Conrad.Irwin/i#.7B.7Bpolytonic.7D.7D_and_.7B.7BGrek.7D.7D. Did you decide which classname will be the same on all the templates yet? Conrad.Irwin 20:51, 25 November 2008 (UTC)[reply]

Not yet, but where there are different scripts or script variants, I would prefer to retain separate class names, for flexibility—so if the font requirements diverge when better fonts become available, or if someone prefers to override them separately, there will be no problem. But conversely, I'd like to merge or harmonize the default font specifications wherever possible, to simplify administration, and for uniform and elegant presentation.

In the longer run, I'd like to see if all foreign-language terms can have standards-compliant language tags applied, with script subtags where appropriate—then fonts can be primarily applied by selecting on these. —Michael Z. 2008-11-25 21:05 z

I think you misunderstand, I want to be able to override them all with one CSS rule. It's perfectly legitimate to have two class names on an object, so I'd like something like class="sc_template_class script_specific_class", so the rule .sc_template_class { } will let me turn off the font-overriding for all of these templates with one rule. Conrad.Irwin 21:09, 25 November 2008 (UTC)[reply]

Ah, right. No, haven't decided yet, but I'll make sure that there is such a class. I'm still tempted to suggest a composite language/script class names to prevent future collisions. (I've located one so far: the existing class .new corresponds to new, the ISO 639-2 code for Nepal Bhasa; Newari, which has no ISO 639-1 code.)

Regarding the Greek font display changes, I think I may have tracked that down. On March 22 the Greek font specs were migrated from the templates to Common.css—the template style for polytonic (in the since-deleted Template:polytonic fonts) had had Palatino Linotype listed first, which may have made Ancient Greek look different previously. Although the change was mentioned in the discussion you linked, it seems that no one cared enough to actually restore the old behaviour. As far as I can tell, modern and Ancient Greek have looked the same for the last seven months for anyone who hasn't used their own style sheet, with the exception of MSIE users with DejaVu Sans installed.

So I'll be bold and merge these two font lists in a little while. —Michael Z. 2008-11-25 21:28 z

First of all, I must admit that I am completely ignorant of each and every thing you guys are talking about. It's all Greek to me. :P Anyway, I just want to make sure that User:Atelaes/monobook.css will still work (after switching polytonic to Grek), as I really do like Palatino Linotype over other fonts for Greek script. Past that I'm certainly in favour of a unified presentation of Greek and Ancient Greek. Just let me know when the switch is made, would you? Also, there are quite a few instances of {{polytonic}}. Making the switches with AWB will clog up RC for days, and so I suggest it be done by bot. Sorry for the intrusion. Thanks. -Atelaes λάλει ἐμοί 07:58, 26 November 2008 (UTC)[reply]

Short story: it will continue to work.

I won't be merging these two templates or doing much of anything which requires AWB or a bot (I only used AWB to eliminate some underused templates, sorry if that caused trouble). For Greek, I'll only make a minor font change in one class, so that by default the two scripts will look the same. And I'll keep you updated here. Let me know if anything gets screwed up, and I'll try to fix it. Cheers. —Michael Z. 2008-11-26 08:14 z

Migrating inline styles to the style sheet

I've removed inline styles from the following templates, because they have been in the style sheet MediaWiki:Common.css for over a month (compare to rev. July 29): {{Arab}}, {{fa-Arab}}, {{ks-Arab}}, {{sd-Arab}}, {{ur-Arab}}.

I've copied or synched inline styles from the following templates into the style sheet, replacing any styles there. They should be kept in synch, and then removed from the templates in a month: {{ku-Arab}}, {{pa-Arab}}, {{ota-Arab}}, {{ps-Arab}}, {{ug-Arab}}, {{Cans}}, {{Cher}}, {{Khmr}}, {{Deva}}, {{Knda}}, {{Laoo}}, {{Mymr}}, {{Runr}}, {{Taml}}, {{Telu}}, {{Tfng}}.

Keeping 47 script templates, with another 36 subordinate “fonts” and “font size” templates all in synch with corresponding specifications in the style sheet is ridiculous. Ridiculous. I have made the following subordinate templates obsolete: {{Arabic fonts}}, {{Arabic font size}}, {{Persian fonts}}, {{Persian font size}}, {{Kashmiri fonts}}, {{Kashmiri font size}}, {{Kurdish font size}}, {{Kurdish fonts}}, {{Ottoman Turkish font size}}, {{Ottoman Turkish fonts}}, {{Punjabi Shahmukhi font size}}, {{Punjabi Shahmukhi fonts}}, {{Sindhi font size}}, {{Sindhi fonts}}, {{Uyghur font size}}, {{Uyghur fonts}}, {{Urdu font size}}, {{Urdu fonts}}, {{Sindhi fonts}}, {{Sindhi font size}}, {{Canadian syllabic fonts}}, {{Cherokee fonts}}, {{Khmer fonts}}, {{Khmer font size}}, {{devanagari fonts}}, {{kannada fonts}}, {{Lao fonts}}, {{Burmese fonts}} {{Runic fonts}}, {{Runic font size}} {{Tamil fonts}}, {{Telugu fonts}}, {{Telugu font size}}, {{Tifinagh fonts}}.

More to follow. —Michael Z. 2008-11-26 08:07 z

Persian entries are now not displaying using the desired font (Tahoma) or font sizing (Firefox). Was that supposed to happen? Tahoma font is desirable for Persian as it is very clear, making it well-suited for learners, even if it is not very aesthetically pleasing. It is also the most popular font for Persian on the web, and I think that there used to be a technical reason, maybe something used to not display properly in other fonts. In the future we may want to have the option of using another Persian font such as XB Zar. Anyway, those the the reasons for specifying Persian separately from other languages using the Arabic script. Kaixinguo 20:08, 26 November 2008 (UTC)[reply]

OK I just checked and Persian is displaying in Tahoma as before in IE6 (XP). Could someone please let me know how Persian, Arabic and Urdu are displaying using Firefox for them? Pershaps it is just my problem :-D Kaixinguo 20:26, 26 November 2008 (UTC)[reply]

Not just your problem—I disabled these font specs for all browsers except MSIE. Now I have restored the blanket font specification for all the Arabic scripts in all browsers. Reload, and let me know if it looks as expected. —Michael Z. 2008-11-26 21:53 z

Thanks, it is displaying in Tahoma again now. Could you let me know why would font specs be disabled for all browsers except IE? thanks. ‍‍‍Kaixinguo 22:40, 26 November 2008 (UTC)[reply]

Most browsers (Safari, Firefox, Opera, anyway) can deal with international text—e.g., if I put a paragraph or word from the Cyrillic Unicode block in a page of Latin-alphabet text, they will just pick a font that has the correct Cyrillic characters and substitute it. Display only breaks if you have a buggy font, which “has” the characters but displays squares or blanks for them, or if you have no font with the correct characters. You can mix dozens of languages and scripts on a page, e.g., Kermit UTF-8 sampler displays fine in Safari or Firefox if you install a few fonts.

MSIE 6 and 7, on the other hand, when confronted with the same page, will pick a font which has Latin characters, and use it to display all text on the page, disregarding any text in a foreign script, with math or technical characters, dingbats, etc, which of course is likely to display as little squares. For brain-dead MSIE, the web page's designer has to guess which appropriate fonts you may have installed on your Windows system, and list them in the page for each block or span of different-language text.

For example, Safari/Mac's default sans-serif Helvetica font is attractive and has a wide international range—it will display a nice-looking page, and substitute the good-looking Lucida Grande for more obscure characters. If a web author specifies the Win font Arial Unicode MS for its wide international range, however, which is included on the Mac for compatibility, that font has no bold, displays bold text in an ugly artificially-boldfaced font, and displays certain diacritics incorrectly, making the page look like ass on a Mac.

So it's common practice to specify fonts for MSIE just so it can display multilingual Unicode text, but let other browsers have free reign to use the page's specified fonts if they work, or choose from the most appropriate Unicode fonts installed. This tends to give the reader the best-looking experience.

Of course, we have a very wide variety of scripts and languages to cover, and in some cases specialized fonts must be specified for more browsers. Sorry for the long explanation, but the way MSIE brings down international web authoring and forces us to jump through hoops is a pet peeve of mine.

By the way, was the Persian text displayed incorrectly without Tahoma, or just not as readably? Are you using Firefox/WinXP? In Safari/Mac Persian displays the same way before and after (or with and without the template). I don't read Arabic, so I can't say whether it is correct. —Michael Z. 2008-11-27 00:45 z

One nice thing about FF (at least) is once you have set the language for something, it continues to use that language for that script until set to something else. So when the inflection line in a Japanese section selects lang=ja, it applies to all the referenced terms in Related and Derived terms etc; if the next section is Mandarin, it starts using lang=zh fonts. Robert Ullmann 10:44, 27 November 2008 (UTC)[reply]

Thanks for explaining that. Most of the time I am using Firefox/XP. In fact, AFAIK Persian displays totally fine without these specifications, only rather on the small side. However, Tahoma is very easy for people learning Persian. Loot at this text in Tahoma: یکی بود یکی نبود compared to without: یکی بود یکی نبود. I will try and find out what it is that is meant to display well in Tahoma. I'm not sure why changing the font specifications to apply only to IE meant that Firefox then displayed in the default Wiktionary fault, because I have specified Tahoma in my settings in Firefox :-/. Anyway, this is mostly over my head so I will stay out if it.

There have been lots of problems with the display of Persian in Windows in the past, some of the effects of which are still ongoing. Windows fonts originally did not have support for the Persian letter 'yeh' U+06CC ARABIC LETTER FARSI YEH, ی, rendering it as (Arabic) U+064A ARABIC LETTER YEH ي with two dots beneath. If there is any word containg this letter, it will have just as many millions of results when you search for it spelling with ي as with ی. There is a Wiktionary entry for یار (spelling: ی - ا -ر ) and not one for يار (spelling: ي - ا -ر), even though they appear identical. There is a similar issue with the letter 'kaf' ک, in earlier versions of Windows it was confused with (Arabic) ك. This affects the usability of Wiktionary for the Persian language, there is more information in this PDF http://behdad.org/download/Publications/persiancomputing/a007.pdf. Kaixinguo 21:30, 27 November 2008 (UTC)[reply]

Update: I had screwed up, by changing the font specifications for several scripts to apply only to MSIE, when I copied these to the style sheet MediaWiki:Common.css.

I think I've fixed it all now, restoring the font set for Cyrillic[4] and the various Arabic scripts[5] yesterday, and for eleven more today: {{Cans}}, {{Cher}}, {{Khmr}}, {{Knda}}, {{Laoo}}, {{Mymr}}, {{Runr}}, {{Taml}}, {{Telu}}, {{Tfng}},[6] and {{Deva}}.[7]

All of these should now be displayed the same way they were before I started, four days ago. Please reload to update the style sheet, and let me know if any non-Latin text still looks wrong. —Michael Z. 2008-11-27 18:06 z

Update: I have copied class names and font styles from nine more script templates to the style sheet: {{Armn}}, {{Glag}}, {{Gujr}}, {{Hebr}}, {{Mlym}}, {{Ogam}}, {{Orya}}, {{Phnx}}, {{Syrc}}.[8]

There remain about 17 script templates which are not represented in the style sheet and do not have class names assigned. There are some other issues with class names, so I will put together a proposal and post it here before I continue. —Michael Z. 2008-11-27 20:38 z

New documentation at Wiktionary:Script templates

I've created a new documentation page. Please review and improve, add appropriate categories, etc. —Michael Z. 2008-11-27 03:37 z

Strange symbols when editing

When you edit a document, you can make a selection of symbols available from a drop-down list at the bottom (I forget the proper name of this facility). When I select "Italian" I now get this (ƀ đ ǥ g̑ ǵ ʰ k̑ ḱ l̥ m̥ n̥ r̥ þ ʷ ₁ ₂ ₃ ) instead of the characters with accents that I was expecting. Any ideas? SemperBlotto 08:39, 21 November 2008 (UTC)[reply]

p.s. Selecting "Latin/Roman" gives ===Alternative forms=== ===Etymology=== ===Pronunciation=== ===Noun=== ===Adjective=== ===Verb=== ===Adverb=== ===Pronoun=== ===Preposition=== ====Usage notes==== ====Synonyms==== ====Antonyms==== ====Derived terms==== ====Related terms==== ====Translations==== ===See also=== ===References===

The feature you are referring to is called "edit tools" and they are defined at MediaWiki:Edittools. That page has been edited four times in the past 24 hours, Hamaryns added the "Headers" section and Rodasmith added and then tweaked the "Sign languages" section. I've have just made a minor change to the IPA section (a previous failed attempt to add the syllabic consonant diacritic had attached itself to the secondary stress marker for some reason).

They are working fine for me, but for you it seems that you are seeing the section above what you are expecting - "Headers" instead of "Latin/Roman" and "Indo-European" instead of "Italian". I don't know why you are getting this, but perhaps doing a hard refresh on the MediaWiki page might help. Thryduulf 13:04, 21 November 2008 (UTC)[reply]

The problem is that when we add a new section, we modify both [[MediaWiki:Edittools]], which your browser doesn't cache, and [[MediaWiki:Monobook.js]], which it does; so until you clear your browser cache, you'll have the old version of [[MediaWiki:Monobook.js]] and the new version of [[MediaWiki:Edittools]], and things will be wonky. (We should really fix that; I'll take a look today and see if I can find a better way for the future.) As Thryduulf says, doing a hard-refresh (on any Wiktionary page) will likely help. —Ruakh_TALK 13:41, 21 November 2008 (UTC)[reply]

O.K., I've now modified [[MediaWiki:Monobook.js]] to dynamically match [[MediaWiki:Edittools]], so once you have the new version of [[MediaWiki:Monobook.js]] (which should happen once you've hard-refreshed any Wiktionary page), this should never happen again. (Tested in FF3, IE7, Chrome on WinXPPro. Let me know if you see any problems.) —Ruakh_TALK 14:48, 21 November 2008 (UTC)[reply]

Well done. Now OK. SemperBlotto 15:15, 21 November 2008 (UTC)[reply]

JavaScript-generated edit-tools.

Currently, the <charinsert> pseudo-elements in the edittools are processed on the server side by the MediaWiki parser. Every time you visit an edit-page, your browser receives whatever's in the edit window; 239KB of edit-tools; and 19.5KB of other stuff. If you're creating a new section, the edit-tools are about 92% of the HTML. Granted, we're still talking about less than 3 MB per edit-page, but it seems rather pointless when you consider that most editors probably use only a handful of its characters, if any.

I'd like to suggest that we stop using the <charinsert> feature, and instead use , together with something like this in [[MediaWiki:Monobook.js]]:

JavaScript snippet (tested in FF3, IE7, Chrome)

function applyCharinserts()
{
  function patchUpInsertTagsArg(arg)
  {
    return(
      arg.replace(/\x22/g,'&quot;').replace(/\x27/g,"\\'").replace(/&#160;/g,' '));
  }

  function charinsertify(s)
  {
    if(s.indexOf('<') > -1)
      return s;
    var strings = s.split(/\s/);
    for(var i = 0; i < strings.length; ++i)
    {
      if(strings[i] == '')
        continue;
      var left, right, index;
      index = strings[i].indexOf('+');
      if(index == -1)
        index = strings[i].length;
      left = strings[i].substring(0, index);
      right = strings[i].substring(index + 1);
      strings[i] = left + right;
      left = patchUpInsertTagsArg(left);
      right = patchUpInsertTagsArg(right);
      strings[i] = "<a onclick=\"insertTags('" + left + "','" + right +
                   "','');return false\" href='#'>" + strings[i] + '</a>';
    }
    return strings.join(' ');
  }

  var edittools = document.getElementById('editpage-specialchars');
  if(! edittools)
    return;
  var spans = edittools.getElementsByTagName('span');
  if(! spans)
    return;
  for(var i = 0; i < spans.length; ++i)
  {
    if((' ' + spans[i].className + ' ').indexOf(' charinsert ') == -1)
      continue;
    spans[i].innerHTML = charinsertify(spans[i].innerHTML);
  }
}

It would reduce the bandwidth per edit-page by about 80% (the current edit-tools would still be about 23KB), and would have the additional benefit that for editors without JavaScript, the edit-tools would be more usable (since the text wouldn't be links, so it would be easier to copy and paste).

The above is just a jumping-off point; there are other features we'd probably want as well, such as the ability for the characters to be pulled from other .js files (firstly allowing to be cached rather than pulled with every page-load, and secondly, if we want, allowing them to be pulled only when a user navigates to that section of the drop down, rather than being pulled for every user). We'd also want users to be able to add their own common characters for their own use (which I think Conrad.Irwin has already written code for somewhere). And we'd probably want to be able to display character+diacritic and insert only diacritic. But I think it's a good jumping-off point; it supports everything we're currently using.

Thoughts?

—Ruakh_TALK 21:24, 21 November 2008 (UTC)[reply]

P.S. BTW, whatever we want to do, we have to add support for it in [[MediaWiki:Monobook.js]] about a month before we add the use of it in [[MediaWiki:Edittools]], since the former gets cached and the latter does not. —Ruakh_TALK 21:27, 21 November 2008 (UTC)[reply]

I have a working implementation of something similar at User:Conrad.Irwin/edittools.js, though I haven't looked at it for some time. I'd like to do this, particularly the bit about not loading all of edittools every page edit. Conrad.Irwin 02:15, 22 November 2008 (UTC)[reply]

Yes, reducing the size of the page is good. On a regular basis I use only the IPA and templates sections (although now I have X-SAMPA input to type IPA I will probably be using the edit tools less) and, particularly when editing from my phone, I don't want to load all the characters for Devanagari, Portuguese, Ancient Greek, etc when I don't need them. Thryduulf 02:29, 22 November 2008 (UTC)[reply]

I was involved in a discussion of Javascript-based Edittools on WP a while back. Here is something I put together at the time: w:User:Mike Dillon/Proposals#JavaScript version of MediaWiki:Edittools. Not sure if they ever did anything with it, but I think some other Wikipedias ended up adopting my code in some form. Mike Dillon 04:22, 22 November 2008 (UTC)[reply]

machine generated audio files

I would like to propose for discussion: Implement an option to play a machine-generated audio file of a word pronunciation (.ogg), (at least for all words that dont have human voice recordings yet or always as an alternative and to compare).

Either create audio files scripted on the server (or even on the fly when user requests it), or if nothing server-side is possible, batch-create audiofiles somewhere locally and mass upload / link via bot. Proposed software: festival or flite. Example in a Linux Bash: echo "word" | flite flite, festvox, festival. Mutante 01:37, 22 November 2008 (UTC)[reply]

<mutante> has there been discussion before on having an option of a machine generated ogg?
<mutante> at least for English words some artifical voices are not that bad
<Equinox> heh, that's a brilliant idea. i bet someone must have made a speech synthesiser that can use IPA or SAMPA
<Equinox> i guess people'd say the speech synths aren't good enough, aren't close enough to human vocal cords yet. but the idea is really good, i think
<Equinox> and, at least, as a default for pages where there's no human recording yet.
<mutante> do you know festival / flite in Linux?
<P0lygl0t> Equinox: unfortunately I haven't found anything that can pronounce IPA
<P0lygl0t> Festival was what came closest indeed
<Equinox> put it on one of the discussion pages. really good idea.

I had wondered about this. Nothing for Windows? DCDuring TALK 02:50, 22 November 2008 (UTC)[reply]

Doing a bit of research on this, I can't find any free software offering IPA reading (and the demo of the AT&T propreitary software didn't seem to be able to pronounce the en-uk tomato IPA (though whether that's it's fault or the IPA is wrong I do not know). While someone might be able to add IPA support to festival with little difficulty, I don't know if we know such a person. Conrad.Irwin 16:21, 23 November 2008 (UTC)[reply]

This is an area well worth watching. It would add enormously to the utility for our larger population of users of all the work that has gone into phonetic pronunciations.

BTW, is it possible to automagically translate among the three main phonetic alphabets semi-reliably, particularly into IPA? Even imperfect translations, to be checked by all of our IPA readers would be helpful, I would think. DCDuring TALK 18:45, 23 November 2008 (UTC)[reply]

It's certainly possible - I have some javascript which does IPA -> SAMPA, and could quite easily be reversed. It would just be a matter of someone knowledgeable (which I am not) to build a conversion table - or to point me in the direction of a good one. It would not be perfect though, as there is not always a one-one correspondence. Conrad.Irwin 21:14, 23 November 2008 (UTC)[reply]

I don't like the idea of batch-loading these files by bot without someone's having checked them. If a user wants to know how a word is pronounced and trusts a computer program's telling him, he can use a program himself: we're here for accuracy. (Recent versions of MS Windows come with a screen reader.)—msh210℠ 17:39, 24 November 2008 (UTC)[reply]

If we really wanted accuracy at all costs we wouldn't have all of wikt done by amateurs. An idea would be to have flags: say, one for whether a pronunciation has been checked by some expert and another when challenged by a (qualified?) user. Right now we may have a number of imperfect IPAs because we don't have enough folks to correct them. The only thing that limits the damage is the limit on the number of those who need pronunciation information who also can read IPA. DCDuring TALK 19:54, 24 November 2008 (UTC)[reply]

I think it is really important that the audio files be labelled as synthesized, and that shown on the page. And that replacing them with human files should be then strongly encouraged.

See depolarization.

And read User:Keffy/IPAc. (:-) Robert Ullmann 14:53, 25 November 2008 (UTC)[reply]

Wiktionary:Beer parlour archive/March 06#Great Pronunciation Flood -- the dikes are cracking explains more, and links to all the bits and pieces intended to use festival and some bot-loading. Robert Ullmann 15:09, 25 November 2008 (UTC)[reply]

I wasn't in love with the quality, in the sense that it seemed too quick, a bit fuzzy, and the stress wasn't exaggerated enough for learning purposes. That said, if others agree with my assessment, it might be possible to adjust it to suit.— This unsigned comment was added by DCDuring (talk • contribs) at 11:31, 25 November 2008.

The intonation and "vocal inflection" are very robotic in all the examples I've heard that are computer generated. The sound just isn't fluid enough to teach an English Learner. We also have the problem that many of our IPA transcriptions are still wrong. --EncycloPetey 19:39, 25 November 2008 (UTC)[reply]

Are they good enough to help a non-IPAer detect possible IPA errors? DCDuring TALK 22:42, 25 November 2008 (UTC)[reply]

Not that I've been able to tell. --EncycloPetey 23:06, 25 November 2008 (UTC)[reply]

Cyrillic fonts

Somebody has changed something. Suddenly the acute accent is left flying high above the letter when I use the Cyrl template: а́е́ы́о́у́. Should look like áéóý. —Stephen 22:01, 25 November 2008 (UTC)[reply]

Guilty (see #Formatting Cyrillic, above). Which browser are you using? Which pages use acute accents? (I've been putting them on transliteration, to avoid having to pipe links). The above look identical in Safari/Mac.

I'll revert the style sheet change. —Michael Z. 2008-11-26 08:59 z

I've reverted my change to the style sheet, so that the Cyrillic fonts list now applies in all browsers. Reload, and let me know if that helps. —Michael Z. 2008-11-26 09:02 z

Babel files

I always have trouble with this, but I usually figure it out. This time I’m stumped, too recursive. I can’t get rid of the double image in Category:User ase-N and Category:User ase (Template:User ase). —Stephen 06:40, 26 November 2008 (UTC)[reply]

Fixed. It may not be a perfect solution, but it works. Any such category can now remove the "standard" English text by setting the parameter standard = no. Undoubtedly Robert will come up with something which works better, but I thought I'd take a crack at it. -Atelaes λάλει ἐμοί 07:43, 26 November 2008 (UTC)[reply]

Why not just use the appropriate userbox in the category header boilerplate? Then one doesn't have to replicate the text anyway, and if the userbox does something different it is reflected automatically? It would take changing User lang-N and User lang-(n, for n=0-5 I think) to only categorize in User space, so that cat doesn't appear in itself. Robert Ullmann 09:08, 27 November 2008 (UTC)[reply]

more on font templates.

I'm confused. Although I have DejaVu Sans, and a number of other good polytonic Greek fonts, the {{polytonic}} template doesn't work for me. It looks ugly. In other words,

{{lang|grc|γλαῦκ’ εἰς Ἀθήνας}}
γλαῦκ’ εἰς Ἀθήνας

...looks horrible compared to

span style="font-family:DejaVu Sans;">γλαῦκ’ εἰς Ἀθήνας</span
γλαῦκ’ εἰς Ἀθήνας

..even though as I understand it, the template should take the first available font from the list. What is going on? Ƿidsiþ 08:38, 26 November 2008 (UTC)[reply]

I've been making changes in these templates, but no functional changes to polytonic. Is this a recent problem?

The template and style only set the font in MSIE. If you are using any other browser, then it will use your browser's default font-choosing mechanism unless you override it. You can override it by putting the following code in your monobook.css or in your browser's user style sheet, then reloading the page (I find I have to wait up to 5 minutes between successive updates):

.polytonic { font-family: DejaVu Sans, sans-serif; }

—Michael Z. 2008-11-26 08:54 z

Interesting, thanks. No it's not a new problem, but I'm confused because it's a problem I ONLY get with {{polytonic}}. Other script templates do their job perfectly for me (I'm usually on Firefox). Ƿidsiþ 11:20, 26 November 2008 (UTC)[reply]

Also, Devanagari inside {Deva} is suddenly significantly larger than it used to be, not just 25% more than ordinary as it is specified inside. Can it be fixed please? --Ivan Štambuk 14:46, 26 November 2008 (UTC)[reply]

This change seems to have introduced giant Devanagari. Though it took place 3 days ago, yesterday everything still looked fine to me?! I have no idea what is the problem and how to fix it, so please Mzajac test the default output of the changed template before applying it ^_^ --Ivan Štambuk 15:23, 26 November 2008 (UTC)[reply]

I'll have a look at fixing all of this shortly. —Michael Z. 2008-11-26 16:08 z

I've made another change to the template. The problem should still be solved, by avoiding resizing the font twice. Please reload and let me know if it still looks okay. —Michael Z. 2008-11-26 17:15 z

The size problem (the cascade increasing the size twice) is also discussed on my talk page; we are fixing it. Robert Ullmann 09:10, 27 November 2008 (UTC)[reply]

Template:es-noun-m

User Bequw made a well-intentioned edit to this template that should simply have allowed the plural form to link to the Spanish section of the target page. However, the template is no longer displaying the plural section correctly in the inflection line when the singular is a wikilinked compound form, and I can't figure out why. See the entry for buen gusto to see what I mean. --EncycloPetey 00:33, 27 November 2008 (UTC)[reply]

And what is wrong? If you want to link to buen gustos then you'll have to specify pl= as well; but that has always been true; it can't take apart the sg param. If you want the way en-noun works (using the PAGENAME and not sg to generate the default plural), then that could be fixed (but Bequw didn't introduce that difference ;-). Robert Ullmann 09:23, 27 November 2008 (UTC)[reply]

Duh! OK, thanks. The problem is obvious now, and I've fixed it. It's not Bequw's fault; he simply used existing code in the template that was faulty to begin with. --EncycloPetey 16:13, 27 November 2008 (UTC)[reply]

Thanks for fixing it. I'll try to copy the change to {{es-noun-f}} etc. --Bequw → ¢ • τ 19:55, 27 November 2008 (UTC)[reply]

Done. Should Template:es-noun-mf be changed too? --Bequw → ¢ • τ 20:29, 27 November 2008 (UTC)[reply]

Yes, all of them should be corrected. --EncycloPetey 19:53, 8 December 2008 (UTC)[reply]

New classes for script templates

Most script templates have an associated class, which is used in the style sheet MediaWiki:Common.css. The following templates don't, and need new class names assigned: {{Avst}}, {{Brai}}, {{Cari}}, {{Egyp}}, {{Ethi}}, {{Goth}}, {{Ital}}, {{Linb}}, {{Lyci}}, {{Lydi}}, {{Olck}}, {{Sinh}}, {{Tibt}}, {{Ugar}}, {{Xpeo}}, {{Xsux}}.

Also, some existing template names should be standardized, because they are currently a mixed bag. The majority of templates have like class names following the template name and ISO script code: Avst, Brai, etc. A bunch of older ones follow language code: AR for Arabic (script code Arab), HY for Armenian (script code Armn). Others include scHebr for Hebrew (Hebr), latinx for extended Latin (Latn) or Old English language (ang-Latn).

But I'm concerned that if or when we assign all 130+ codes to class names, then we risk future conflicts. Unlike the Template: space of this project, we don't have control over which class names the WikiMedia software may assign in the future.

Should we consider adding a prefix to reduce the possibility of conflicts, like scLatn, sc-Latn, or script-Latn? —Michael Z. 2008-11-27 22:49 z

We may as well add a prefix like sc- if we are going to rename them all, but I'd not bother with that. Conrad.Irwin 00:43, 28 November 2008 (UTC)[reply]

Okay. I've thought hard about this. Even though there's no demand, I can just imagine editors wasting many collective hours of their time trying to track down a future class conflict or identify one of a hundred four-letter classes. Adding sc- to the front of each class will add minimal overhead, but help the interface document itself, and prevent potential problems.

If there's no objection, I will add new class names in the format sc-Avst, sc-Brai, etc, and then add redundant class names to the existing ones (sc-Arab, sc-fa-Arab, sc-ps-Arab, etc), in preparation for replacing them all. —Michael Z. 2008-12-01 21:39 z

objection! (;-) we had this discussion re the template names, where there is much more possible conflict, but is in fact none. The script codes were designed on purpose with a format (Xxxx) that doesn't conflict with things like language codes. This is not a large set, it is a very small set, and unlikely to grow by more than a few. Please don't use "sc-". It does conflict: "sc-Latn" is Sardinian in Latin script, right? Adding "sc-" everywhere is confusing noise. Please don't. Is fine the way you were doing it, there is no problem here. Robert Ullmann 08:48, 2 December 2008 (UTC)[reply]

But we only control the Template: namespace domain for this project, while the class-name domain is shared with all of WikiMedia—isn't there a chance that the next year's monobook skin or some cool new extension from Wikipedia will clash with a script class name?

The prefix actually does not conflict, if we define the names as starting with "sc-". If you want to tag Sardinian Latin, it would be sc-sc-Latn. If you consider this a conflict, then what about the existing class “new”—apparently every single red-linked term is in w:Nepal Bhasa.

Along with Xxxx, have to worry about language-script combinations, in the formats xx-Xxxx (e.g., ku-Arab) and xxx-Xxxx (e.g., ota-Arab). The existing class “rfc-trans” could cause confusion, and “use-with-mention” might trouble someone who tried using a hyphen-attribute selector for language classes (which is what it is designed for). (FYI: it is technically incorrect that script codes are designed as Xxxx; they are actually case-insensitive, and we are using them as case sensitive in our class-name domain.)

I will drop the prefix now, but please assure me that these issues can't become headaches for someone in the future. —Michael Z. 2008-12-02 16:47 z

I've updated the style sheet and respective templates by adding the new standardized class names. The old classes are still in place too, and I've left a note on each template's documentation that it will be safe to remove those in a month. —Michael Z. 2008-12-08 06:00 z

Class name for all script templates

Related to the above, I'd like to add an additional class name to every script template, to help readers and editors customize the interface. This would merely serve as a flag denoting the HTML element which may have CSS font-family, font-size, font-style, text-decoration, or other specification overriding the Wiktionary default. The name should be descriptive and self-explanatory for ease of editing style sheets. Most of these relate directly to individual writing systems, but some are language-specific (e.g., Arabic variants, like fa-Arab for Farsi), and a few are generalized (e.g., unicode for obscure characters in any context).

I'd suggest a name like styled-text, non-Latin, or foreign-script. Any better ideas? —Michael Z. 2008-11-27 23:04 z

I'd go for foreign-script, as that is what these templates are used for. styled-text is too generic, and non-Latin requires a small hop to work out the meaning. Conrad.Irwin 23:52, 27 November 2008 (UTC)[reply]

foreign-script it is. I will add this class to all of the foreign-script templates, but not the transliteration and pronunciation templates (do these want their own collective classes?). It may be several days before I get this together: I think that for uniformity and ease of maintenance, I will use another transcluded template. —Michael Z. 2008-12-01 21:48 z

"foreign" to what? "non-Latin" is a much much better name: you are going to use it in all templates other than {{Latn}} right? So it says what it is. "foreign" is wrong: it either means nothing (foreign to what) or non-English (this being the en.wikt). Vietnamese is "foreign", and in a "foreign script" (quốc ngữ) right? But it is Latin, and you are tagging non-Latin, correct? So call it that. Robert Ullmann 09:04, 2 December 2008 (UTC)[reply]

Please don't use another transcluded template, we are trying to get rid of that (all those font and font size things). It would be much better to get all this sorted, and then do the templates; right now doing them incrementally is just confusing things. Robert Ullmann 09:04, 2 December 2008 (UTC)[reply]

Latin is our native script, in English, so other scripts are foreign. non-Latin is more specific—perhaps non-Roman would be better because it won't be confused with the Latin language.

Doing them incrementally is helping me find bugs and conflicts as I go, rather than after major reorganization. For example, it got your attention regarding the template prefix sc- before I began any functional changes to existing styles, or altered the style sheet.

I thought that complex conditional parser functions were potentially expensive, but transclusions not. If I transclude, then it would be to centralize code (e.g. all script templates might transclude a central template to standardize format), rather than to further fragment it (trying to synchronize templates like {{Persian fonts}} and {{Persian font size}} with both their parent and the style sheet was a serious pain). —Michael Z. 2008-12-02 17:00 z

Conditional parser functions are not expensive at all, except when they use branches that invoke other templates or expansions that are then discarded. This is often the case because people don't understand the language. Doing #if is a few hundred machine instructions; transcluding a template is an SQL query, network RPC's to the back end, and disk ops. (modulo some caching) Many orders more expensive. (Do note that using a sub-template for the 2nd or subsequent times on the same page does not add overhead.) Robert Ullmann 12:12, 7 December 2008 (UTC)[reply]

Okay, you are my technical consultant on this one. —Michael Z. 2008-12-08 04:33 z

Absolute font size for Armenian

The Armenian script template {{Armn}} and the style sheet MediaWiki:Common.css have the following code for Armenian script:

 .HY {
       font-family: Sylfaen, Arial Unicode MS;
       font-size: 15px;
       }

Font-size should be specified using relative units: ems or percent, instead of absolute pixels. I think an equivalent to 15px in most browsers would be font-size: 115%;. Examples of random words, with Latin text for comparison:

Latin սարդ երիկամ հարցազրույց — no font
Latin սարդ երիկամ հարցազրույց — no font, 115%
Latin սարդ երիկամ հարցազրույց — no font, 140%
Latin սարդ երիկամ հարցազրույց — Armenian font at default size
Latin սարդ երիկամ հարցազրույց — {Armn} (Armenian font and absolute 15px)
Latin սարդ երիկամ հարցազրույց — Armenian font and 115%
Latin սարդ երիկամ հարցազրույց — Armenian font and 165%

Nos. 3 and 7 are sized to match the x-height of my browser's default Latin font. (I don't read Armenian, but the default font looks more attractive in Safari/Mac.) —Michael Z. 2008-11-27 23:59 z

As always the default with no font specified looks correct for me (Firefox 3/Debian linux). Conrad.Irwin 00:42, 28 November 2008 (UTC)[reply]

In FF/WinXP, 2, 5, and 6 look the same, matched to m-height of the Latin text (which looks good). Font has no effect (I have no added fonts loaded.) In IE, 2 and 6 are tiny bit larger I think, and 5 is best. Chrome, um, won't load all of GP right now. Robert Ullmann 11:20, 29 November 2008 (UTC)[reply]

Fallback fonts for script templates

Most web authoring guides recommend specifying a fall-back font at the end of the list, one of serif, sans-serif, cursive, fantasy, or monospace.[9] This may help preserve the look of a web page when the reader's system has none of the specified fonts.

~~But Wiktionary doesn't specify any default font for the page at all. Every system I've tried happens to render the site using a sans-serif font.~~ As pointed out below, Wiktionary sets sans-serif as the default site font.

Should we add a fallback font-family: sans-serif; to all of the scripts' font specifications? Or should we remove the few that are there, and leave it up to the browser, or to the reader's web browser preferences? ~~(as is already the case for the rest of the page)? I think the latter is more consistent with the philosophy evident in Wiktionary's design.~~

Are there any particular scripts which should have their own fall-back font specified, to improve readability or to differentiate the script? I have seen editors mention that is nice to have a different font to distinguish Ancient Greek from modern Greek, although we don't have that specified right now. Personally, I would prefer to offer a unified page design by default, and provide a help page to show editors how to specify fonts for particular scripts themselves (registered users can do this in user style, others can do it with their browser's mechanism). —Michael Z. 2008-11-28 00:36 z

See http://upload.wikimedia.org/skins/monobook/main.css?184 (line 42) where Sans-serif is defined as our default font. My personal opinion is that we should not force our preferences for fonts onto other users, but define as few fonts as possible so that there are no scripts displaying completely incorrectly. As we have the default sans-serif everywhere, we should try and use sans-serif fonts where possible, though people seem to prefer the serif ones for some things :s. Conrad.Irwin 00:40, 28 November 2008 (UTC)[reply]

Oops. Then it would make sense to set sans-serif as the fallback, to match the rest of the page. —Michael Z. 2008-11-28 01:06 z

I've added the fallback generic “sans-serif” font family to all script templates, except for Hebrew which already had font-family: serif; specified. —Michael Z. 2008-12-08 06:10 z

Bulk-checking words

Suppose I have a biggish list of words and I want to filter it, leaving only the words that are not already in Wiktionary. Is there a good, efficient way to do that in bulk, rather than checking on a page-by-page basis? Equinox 21:08, 28 November 2008 (UTC)[reply]

Probably not a fool-proof way. We may have a page with that name, but it could be a redirect, or it could be an entry for a word in a different language. --EncycloPetey 21:50, 28 November 2008 (UTC)[reply]

EP is correct, but if you want a non-foolproof way, you can download the "all-titles-in-ns0" file from http://download.wikimedia.org/enwiktionary/latest/ and use the list-comparison technique of your choice (I've been using Access, but I'm sure there are simpler ways). As noted, that will eliminate all titles that exist in the database, including entries that may lack the relevant language or sense; that may or may not be a deal-breaker depending on your situation. It will also fail to exclude any entries created since the last database dump. -- Visviva 03:26, 29 November 2008 (UTC)[reply]

Doing it in a fairly good way takes some coding, but it isn't hard (if you can do that), for example in Python with the wikipedia framework (all you need is xmlreader.py):

    English = set()
    dump = xmlreader.XmlDump("en-wikt.xml")
    for entry in dump.parse():
        if ':' in entry.title: continue
        if entry.text[:1] == '#': continue
        if '==English==' not in entry.text: continue
        English.add(entry.title)

Then look at your list of words and do if word not in English: You can get an XML dump from the last day from http://devtionary.info/w/dump/xmlu/ If you do code something like this, variations are fairly obvious ;-) Robert Ullmann 10:58, 29 November 2008 (UTC)[reply]

Weird category

Anybody can tell me what {{countable}} (and several others: {{determiner}}, {{ergative}},{{impersonal}} etc.) does in Category:Usage context labels? Circeus 06:54, 29 November 2008 (UTC)[reply]

Because they are used in the same way as context labels -- i.e., as tags at the beginning of the sense line. If it's a problem, perhaps there should be a general Category:Definition label templates or some such, which could include these as well as the genuine context labels. -- Visviva 12:44, 29 November 2008 (UTC)[reply]

No, it is a downright incorrect category: they are supposed to be in category:Grammatical context labels. Category:Usage context labels is for stuff like {{slang}}, {{endearing}} and {{derogatory}} (that one is also in a completely inaccurate category,which is for stuff like {{biology}}, and I have NO idea how those cascade down from {{context}}). 17:32, 29 November 2008 (UTC)

{context} is defaulting them into "Usage..." because they use poscat=. The correct way to do this is to use tcat=grammatical and not use an explicit category. See {{countable}}. Robert Ullmann 15:37, 1 December 2008 (UTC)[reply]

Seems like a side-effect of the whole "categorization in {{context}}" debacle from last summer. Anyway, {{context}} is a complicated template. Circeus 18:44, 1 December 2008 (UTC)[reply]

Which left us with a lot of explicit cats in the templates, as well as catting from {context} ... Robert Ullmann 08:36, 2 December 2008 (UTC)[reply]

Can somebody do the edit to Template:pluralonly? Circeus 19:39, 1 December 2008 (UTC)[reply]

Done. Robert Ullmann 08:36, 2 December 2008 (UTC)[reply]