Open main menu

Bot task: Removing the arguments from the citation templateEdit

Back in the mists of time, {{citation}} took an argument, rather than relying on the name of the page. There are still some number of examples of this spread around. It'd be good to get a bot to 1) Remove the argument when it's identical to the name of the page. 2) List the uses where it's not identical for human intervention. Any bod herders willing to take this up? JesseW 18:24, 1 August 2008 (UTC)

It is possible to inspect the 1300-1400 pages at a rate of about 10 per minute using popups. I looked at about 100 of them and found one rehi that differed in capitalization. On that page the other citations template had the lowercase form in the citation template. What is the harm from having the redundant headword as an argument in the template, especially since there are not likely to be many. For example, I believe that none or the wikipedia citations added by Ullman have an argument in the template. The ones that I had added that had multiple citations templates were attempting to differentiate among spellings, forms, different senses, etc. DCDuring TALK 00:22, 2 August 2008 (UTC)
It does seem like I should investigate the actual use of the argument more; I'll see what I can hack up with api.php to do that. JesseW 01:34, 3 August 2008 (UTC)

XML dumps stuck againEdit

And based on the track record, when Brion Vibber "resets" them again, we will be on the ass-end again, waiting WEEKS or MONTHS for a dump.

Yes I know there are problems; but these are queuing issues that ARE FUCKING TRIVIAL to fix, but we get NOWHERE. What does it take to get them to ACCEPT HELP?

(shall I declare /rant now? ;-) no, not quite yet ...

Think I should apply for a WMF job? maybe help clear the two+ year queue of bugzilla bugs that haven't been answered?

Dunno. Why can't the Cantonese projects be moved from zh-yue to yue after 19 months pending?

Why can't we get a dump that takes a few hours for two months at a time? Why is the pending complete dump of the en.pedia December 24th 2008, with the date for the 7z complete next [imaginative expletive deleted] northern hemisphere spring?

WTF? Robert Ullmann 23:28, 1 August 2008 (UTC)

you should look at the dumps ... the entire current version took 5 hours 11 min, with all the talk pages etc half a day. With all the history six months? something is wrong. And needs fixing.
(note my SO went to hospital—do not know which—at 9pm, 6 UTC, have heard nothing yet; so I am sitting here at 3am local knowing nothing) Robert Ullmann 00:00, 2 August 2008 (UTC) did get an SMS, okay for now. Robert Ullmann 00:24, 2 August 2008 (UTC)
Brion says they are unpacking some new disks etc and when all that is set up it should be better; but the process still has horrible problems. Robert Ullmann 12:22, 2 August 2008 (UTC)
I noted that the 7 October dump aborted. How wonderful that we have another solution. Thanks RU. DCDuring TALK 11:00, 10 October 2008 (UTC)

Problem with column templatesEdit

In the synonyms section of I attempted to distribute the long list spanning three columns. It didn't work too well. Could someone take a look at this? __meco 08:15, 3 August 2008 (UTC)

I've given a shot. Please take a look and see what you think. Also, should the second def be "cease to exist," and not "seize to exist?" -Atelaes λάλει ἐμοί 08:22, 3 August 2008 (UTC)
Both issues are entirely my fault, sorry about that, and thanks for cleaning up the long list with synonyms. I somehow managed to confuse cease with seize; I've corrected it now. Michae2109 13:06, 3 August 2008 (UTC)
"Seize to exist"? I cannot imagine what that would imply. No, it's definitely "Cease to exist". And yes, it looks fine now! __meco 18:10, 3 August 2008 (UTC)

ligatures in Sanskrit textEdit

Hello, I am really embarrassed and anxious because the words I tried to write in Sanskrit do not appear with their proper ligatures - for example for the Sanskrit antecessor of sew - the "च्य" ("vya") should appear as a ligature, i. e. as a sole sign. I noticed a similar bug in the article namaskar too with "स्क". The most flabbergasting, however, is that as I was writing the first word in Wordpad, the two letters did merge and the ligature was available. As soon as I pasted it into the box here, the letters separated and the ligature was gone. I beg you to fix the problem, since ligatures are crucial for numerous languages written in Devanagari, as Sanskrit, Hindi and so forth. (more at Talk:sew) Bogorm 17:01, 7 August 2008 (UTC)

This is more of a Grease Pit issue.--TBC 17:04, 7 August 2008 (UTC)
I am thereby moving it, thank you for the advice. Bogorm 17:18, 7 August 2008 (UTC)
Can you see conjunct consonants here: च्य, स्क, त्त, र्प, क्र ? --Ivan Štambuk 17:44, 7 August 2008 (UTC)
No, they are still separated. I not only do see them as separated, but when I select them, I can select them one by one, which sould not be the case. They should be one characer actually. Thanks for your try to assist. (Blagodarim, is it so in your language?) Bogorm 18:44, 7 August 2008 (UTC)
Then this is an issue with your computer. I see them as unified characters, and can only select them as units. May I ask what browser and OS you're using? -Atelaes λάλει ἐμοί 18:49, 7 August 2008 (UTC)
Unified charactars? only as units? I am really dumbstruck, because that is not the case with me... Well, although I detest all forms of advertisement, this OS and two kinds of browsers - this one and this one. If that can help anyway, I would be grateful... Bogorm 19:10, 7 August 2008 (UTC)
Hmm.....well, I guess I'm not sure. I'm on Vista with FF 3, so it could be the OS, but somehow I think it unlikely. Try taking a look at w:Help:Multilingual support (Indic). There's a lot of good stuff there, including a number of good tests which might help isolate the issue. -Atelaes λάλει ἐμοί 19:58, 7 August 2008 (UTC)
Yes, this problem is in your own browser. The ligatures look fine to me. How about these: च्य, स्क, त्त, र्प, क्र? —Stephen 05:36, 10 August 2008 (UTC)
Well, I had not noticed it until recently, but these ligatures appear properly on this browser, while on the second, which I use predominantly, they are still separated. It is apparently due to the browser how to display them and I can only bewail the users with the second one, however much I am accustomed to it, since they will not be able to behold the Devanagari script properly... (to Atelaes) I do not know whether my second browser is the third version like yours, but perhaps it is not and if in the third version there are no problems I can only be glad for you... Bogorm 09:46, 11 August 2008 (UTC)

Wiktionary:Requested entriesEdit

"Requested entries" in the navigation box links to Wiktionary:Requested articles, which is a redirect to Wiktionary:Requested entries, couldn't this be changed so that one doesn't have to be redirected? /Natox 14:15, 8 August 2008 (UTC)

I updated MediaWiki:Requestedarticles-url to fix this. Mike Dillon 14:48, 8 August 2008 (UTC)


There are a large number of Spanish verb form entries edited by user:McBot (owned by user:Dmcdevit, who hasn't contributed since early June) back in April that utilise the apparently non-existant {{esbot:catline}} and also have the balance of square and curly brackets wrong (i.e.


, which doesn't work and should be one of

verb=[[{{{verb}}}]] or verb={{{[[verb]]}}}

. I'm not certain what the edits were intending to do with these so I don't know how to fix them.

I've only just become aware of them through User:Robert Ullmann/Mismatched wikisyntax#d. Thryduulf 20:26, 8 August 2008 (UTC)

The Spanish verb forms really are a mess. I know I've seen other problems with a few entries formatted by McBot (wrong categories, etc.). Also, no new entries have been created since 2006 when TheDaveRoss created them. So I think we need some kind of new bot (or an old one) to reformat / enter new forms. I don't have the knowledge to do so. Nadando 00:14, 9 August 2008 (UTC)
{{esbot:catline}} got deleted on 22 May 2008 following this "discussion" on RFDO. The discussion doesn't provide much insight on its original purposes. From the examples I checked on User:Robert Ullmann/Mismatched wikisyntax#d, it would appear that in every line starting with {~esbot everything from {~esbot to the end of the line could be deleted without causing any harm to anything. Spanish verb forms would still be a mess but AF could probably improve the formatting quite efficiently. -- Gauss 17:44, 9 August 2008 (UTC)
McBot has fixed a lot of things that were wrong, but also created a few problems; this was a run of "automatic bracket addition" that was not a good idea. When these entries were created I tried over and over to get TheDaveRoss to put in a form-of template that would be left in the entry (e.g. now {{es-verb form of}}), but he added templates for everything else that weren't needed (esbot:catline), and not the thing that was! McBot has mostly fixed that. But as Nandando says, something ought to go recheck all of them. (And not with blind regex rulesets.) In this particular case, the ones in the mismatched report may be the only problem. The "esbot:catline" templates have been removed from others, but since these are badly formatted, they don't result in links to the template, and were missed. Robert Ullmann 17:54, 9 August 2008 (UTC)
Oh, what McBot was trying to do was wikilink the "verb=" parameter in the form of template, so the entry would count in stats, but caught the verb= in the esbot:catline template. (regex not restricted enough) Robert Ullmann 18:02, 9 August 2008 (UTC)

FYI: I have been working for the last day or so on getting AF to hunt down and fix entries with {{{ in them; any number of people over time have not understood that you can't subst: a template with optional parameters unless you use some serious syntax within the template. E.g. is you have a template {new foo} with three parameters, but you don't want to use the third in a particular case, you have to write {{subst:new foo|one|two|}} so the 3rd parameter exists but is blank, else you get {{{3|}}} left in the wikitext. If you want a conditional in the template, it takes much more magic! Go look at the source to {{xhan}} ... but in any case I and AF will be hunting these. (;-) Robert Ullmann 18:21, 9 August 2008 (UTC)

Various entries, including the forms noted, are being collected in Category:Entries with template subst detritus; I think AF can probably fix all of them except a few where someone subst'd a template (King's) that have to be manually redone. Robert Ullmann 14:43, 10 August 2008 (UTC)
AF would be fixing all these, except that the ability to edit a template and have conditionals and categories re-evaluated has been thoroughly broken in the WM software. I suspected as much a few weeks ago, now have essentially confirmed it. (It is also possible the job queue is just solidly stuck, but it seemed to be running down just fine.) Robert Ullmann 17:37, 10 August 2008 (UTC)
Indeed it might have something to do with the job queue, which has always seemed to have some serious structural problems. (How can it wobble back and forth between 96K and 106K, without at the same time getting anywhere?) Robert Ullmann 22:41, 10 August 2008 (UTC)
Apparently I managed to beat on it hard enough (adding and removing text visible on the page may have helped). Or something. Anyway, AF has cleared them, with a little help; anything that it gets stuck on will show up there now. (in Category:Entries with template subst detritus). Is about 1/2 through re-screening the whole DB (as of June 13th, of course). Robert Ullmann 15:23, 12 August 2008 (UTC)

"Anchor points" in definitionsEdit

Could someone who knows about such technicalities have a look at User:Sack36's edit to argument and a similar edit to conflict. I don't know what the purpose is, but these edits seem to be forcing inappropriate line breaks in the defns. -- WikiPedant 03:38, 9 August 2008 (UTC)

Presumably the purpose is to enable links like [[conflict#N1]] (which links directly to the first noun definition). I've fixed them to use <span> instead of <div> so they won't cause line breaks, but I'm not sure this is really the best approach, since definition numbers are always in a state of flux. If we want to support that sort of direct-linking to definitions, I think we should discuss how we want to do it. —RuakhTALK 04:14, 9 August 2008 (UTC)
Thanks, Ruakh. Just looks like clutter to me. Maybe an admin should suggest to Sack36 that he cool it in the interim (and also tip him off to the fact that putting 4 tildes in the edit summaries doesn't work). -- WikiPedant 04:46, 9 August 2008 (UTC)


Can anyone tell me why the Ancient Greek characters I see in EditTools look so much sexier than the ones which {{polytonic}} actually displays within an entry? In Edittools they are all curvy and basically look much better; but within real pages the font seems different, much flatter and straighter and just less nice. I'm in Firefox if that makes any difference. Ƿidsiþ 09:37, 9 August 2008 (UTC)

It's because {{polytonic}} has been modified so as to specify fonts only for readers whose browsers have a certain bug, the theory being that browsers without this bug can handle font-selection on their own. (The bug itself actually has nothing to do with poor font-selection, except that both are deemed to be characteristic of Internet Explorer 6.) I don't know whether this theory is valid. —RuakhTALK 17:16, 9 August 2008 (UTC)
Try copying what I've got in User:Atelaes/monobook.css to your own. It's made my Greek characters much less bland, anyway. If that doesn't work, you may want to look at the Ancient Greek section of MediaWiki:Edittools. It's using a font specific formatting, as {{polytonic}} used to. One of the listed fonts is probably the one you're talking about. -Atelaes λάλει ἐμοί 18:31, 9 August 2008 (UTC)

The MSIE bug is described at w:Unicode and HTML#Web browser support, with a couple of references. MSIE picks a font for an entire text block without looking at all of the characters in it. The Wikipedia article hasn't been updated in a while and I haven't been able to do any testing since then, so I have no idea if this still holds true in MSIE 7 or beta 8. Michael Z. 2008-08-09 22:20 z

As far as you're aware, are there any browsers that distinguish passages in polytonic Greek from passages in monotonic Greek — either by recognizing lang="grc" and/or xml:lang="grc", or by noting whether a passage contains any polytonic-only characters — and use appropriate fonts for each? —RuakhTALK 15:09, 10 August 2008 (UTC)
I'm not aware of the specific requirements for polytonic. But I do notice that the three examples at the bottom of Template talk:polytonic have styling which is intended to apply in MSIE only. The three look identical and unbroken in Safari and Firefox on my machine.
I do know that Safari, and usually Firefox, automatically chooses fonts sufficient to display each character on the page, in cases where MSIE displays little rectangles, and drools on its chin a bit. So in Wikipedia and here, even though we have perfectly well-formed and valid HTML pages in Unicode text, we've had to develop, deploy, and maintain a complex array of templates, CSS classes, and categories just to show the letters!
Now some text has to be formatted in a particular font for presentation purposes, like Old Church Slavonic. Well, MSIE is also a bit dim regarding the decade-old CSS 2 standard, so instead of adding a line to the style sheet like :lang(chu) {font-family: <Slavonic font list>;}, we still have to play around with more complex workarounds involving classes and style sheet hacks for MSIE. Michael Z. 2008-08-10 19:34 z
For me (FF3) those three do not look the same. And more generally, I disagree with the view that the purpose of script templates is merely to make characters render; in many or most cases, it's to make to them render readably (and preferably while looking nice, if possible). To take a case that I understand better — many fonts include Hebrew characters, but for most it's impossible to distinguish all the vowels unless you make the text extraordinarily large. (This isn't a problem for most web-sites with Hebrew, because most of them use vowels rarely if ever, but for us it's a problem.) I believe the issue with polytonic Greek is similar; for a reader who wants to distinguish rough breathing from smooth breathing even in the presence of an acute accent, most fonts don't seem to cut it. (This isn't a problem for most web-sites with Greek, because most of them are Modern Greek, which drops the breathing marks, but for us it's a problem.) You accept the need to format Old Church Slavonic presentably, but seem to treat it as an exception, whereas I think it's actually the rule. —RuakhTALK 21:53, 10 August 2008 (UTC)
Maybe there are OS and font issues affecting your display too, I guess. You might try downloading Safari for Windows, and then using the Lucida Grande font for your default (Safari comes with the font, but this Mac guy doesn't know whether it will be available in Firefox or MSIE after installation). Unfortunately, L.G. doesn't have italics. I've found that it renders most international text, including polytonic Greek and Old Church Slavonic characters present in Unicode 5.0. It seems to render everything shown in w:Greek diacritics—I can distinguish the breathing marks, but some are pretty subtle—I'd have to read Greek to tell you whether it is good enough. Turning on font smoothing might help distinguish the breathing marks, depending on the font. Of course, none of this benefits the average reader with default settings in Windows.
Safari/Mac's default rendering of w:Greek diacritics#Lower case (with smoothing for my LCD display). It uses the Lucida Grande font supplied with the OS.
I don't know about Hebrew, but w:Hebrew alphabet seems to display nicely, even with all style sheets disabled (I do that using a bookmarklet).
Old Church Slavonic is an exception, because real support for the script only came about in Unicode 5.1 last April, no operating system yet supports it explicitly, and fonts have only started to become available since then. Now there is a real reason to start using the Cyrs script tag. Before that, we didn't use any template for OCS, and it displayed just fine on the Mac. Now we are taking advantage of this to also prefer fonts which include the newly-introduced characters, and also make the display match traditional print typography for OCS.
Regarding the purpose of the script templates, I believe that w:template:Unicode was developed first, specifically to work around MSIE bugs. The framework of templates, categories, classes, and CSS styles and resets were developed from it, and are now implemented separately on many wikiprojects. As far as I can see, every single .IPA, .Unicode, .polytonic, and :lang selector except for .mufi in w:mediawiki:Common.css is intended only to apply to MSIE. From this I presume that display of these kinds of text is acceptable in other browsers but broken in MSIE.
You're right, the script templates don't only make characters render—they also apply the lang and xml:lang attributes, which ought to be placed there for accessibility. But many of our editor hours could have been saved if it had not also been necessary to coddle MSIE. Many megabytes of font specifications could have been left out of web pages, and our style sheets would be simpler and more flexible. Michael Z. 2008-08-10 23:47 z
> From this I presume that display of these kinds of text is acceptable in other browsers but broken in MSIE.
As I said above: that's the theory, but I don't know whether the theory is valid.
As for Hebrew: can you tell the difference between אַ and אֵ? (The former has a line underneath, while the latter has two dots.) How clearly can you make it out? Because without {{Hebr}}, I can't distinguish them at all.
RuakhTALK 00:09, 11 August 2008 (UTC)
At normal text size (13px), they are vaguely distinguishable from each other, but not readable in Safari/Mac. When I hit cmd-plus to bump up the font size by one step to 15px, the dots are perfectly clear. Michael Z. 2008-08-11 00:20 z
Here's what I see (I think the two got transposed when I was cut-and-pasting):
Rendering in Safari/Mac.
  • default size: אַ אֵ
  • <big>: אַ אֵ
  • <span style="font-size:larger;">: אַ אֵ

Conjugation/declension tables for RussianEdit

Why aren't there simpler templates, requiring only what's needed for the conjugation/declension, instead of necessarily requiring the whole table? Most of the verbs conjugate regularly, and most of the nouns/adjectives decline regularly too... The entry for "знать", for example, has every conjugation of the verb in the template ru-verb-1-impf. The Russian entry uses the template Гл1a, which only requires the root зна-. And it gets even uglier for entries like "твой"; here, the whole table is drawn, there isn't even a template...

Shouldn't this template be "imported" (I don't really know how it works) to the English Wiktionary? If it's already accessible, isn't there any way a bot can scan the entries and change them?

Compare the code needed for the declension of "работа":

  • en-wiktionary: ru-noun1|рабо́та|рабо́ты|рабо́ты|рабо́т|рабо́те|рабо́там|рабо́ту|рабо́ты|рабо́той, <br/>рабо́тою|рабо́тами|о рабо́те|о рабо́тах
  • ru-wiktionary: СущЖенНеодуш1a|основа=рабо́т

~> SilvioRicardoC 02:20, 10 August 2008 (UTC)

On the contrary, there is a lot of irregularity in the verbs, and even more in the nouns. Only the adjectives are regular, and even they are unpredictable when it comes to short forms and comparatives. For твой, there are very few words like that, so a template is more trouble than adding a table. And as for the templates used in Russian Wiktionary, they are so difficult and confusing that I can’t work them, and I see a lot of errors there which I cannot fix, since I can’t do the templates. —Stephen 05:29, 10 August 2008 (UTC)
A quick count shows that the Russian Wiktionary has about 368 templates that start with Сущ (=noun?). I don't know anything about Russian declensions, and a don't know to what extent those templates are in use (or just usable) but the figure might indicate that the matter is rather complex. The maintenance of a large number of templates for a similar purpose may also be relatively laborious, and this could greatly reduce the efficiency that may seem to be gained by them. -- Gauss 15:34, 10 August 2008 (UTC)
Similar with Polish. That’s why there is a table in twój rather than a template. —Stephen 16:07, 10 August 2008 (UTC)
Any given language with complex cases should have:
  • a small number of layout templates, one for each table design
  • a reasonable number of templates for regular declensions/conjugations, that use the layout templates
  • one or several templates with a (largish) number of parameters for irregular inflections, that use the layout templates
  • in some cases templates for specific verbs (where there a just a few irregularities) are useful
Under no circumstances whatever should the table syntax be in individual entries. It becomes utterly impossible to maintain. (Suppose you want to change the colour scheme? Suppose you want to change the name of a particular case to point to a glossary entry instead of main namespace; or to a page for that case for that language? Or any of a dozen other things that should be consistent? If someone has subst'd or pasted the table syntax into a bunch of entries to make changes, you are screwed.) Robert Ullmann 17:54, 10 August 2008 (UTC)
That's exaclty what was getting me worried.
Besides the fact that some Russian (and Polish, to some extent) entries are needlessly verbose (there are a lot of russian nouns that conjugate regularly like работа, and, up to now, the existent templates still require the whole conjugation), some entries are laid out in the page itself, without a layout template... It's maintenance hell!
I'm still a newbie here, but it seems like bots are Python programs, right (I mean, they have the whole Turing's Power on their side)? Then, am I right to suppose that a bot could walk through pages and, if well programmed, "templatize" them? Abstracting regularly conjugated words like работа would be reaaally nice. Abstracting tables like the pronoun tables in Russian and Polish (see мой and mój) would be a necessity, as far as I see.
~> SilvioRicardoC 02:09, 11 August 2008 (UTC)
For the record, Polish possessive pronouns use the standard declension table for adjectives now. It is true that there could be more shortcuts for frequent special cases of {{pl-decl-noun}} but this doesn't seem to have discouraged editors to add declensions at all. -- Gauss 16:27, 13 August 2008 (UTC)
Any plans on changing the personal pronouns too? BTW, I was thinking of putting the declension of się in the Polish pronouns appendix, but I wasn't sure if I should put it in the first table (dividing it in singular, plural and reflexive)... Any thoughts on that? Maybe even have the declension tables in every pronoun entries (presence of a declension table in the entry is inconsistent right now: ja vs on)? ~> SilvioRicardoC 20:28, 13 August 2008 (UTC)

Ok, I guess I can create some basic ones. Is there any convention for names I can look at. For example, two common (usually feminine) conjugations are the ones ending in unstressed а (соба́ка, изме́на, мы́шка...) and the ones in stressed а that change root in the plural (зима́-зи́мы, жена́-жёны, теща́-тёщи...). How would these templates be named? ru-noun-f and ru-noun-f2? Of course, things aren't really simple; the biggest problem I can anticipate is ш, щ, г, х, к, requiring the next letter to be и and а instead of ы and я...

The declensions of pronouns could use some love too (the tables are being laid out in the page, no templates). Check out меня, сам, себя, мой, ничто...

~> SilvioRicardoC 20:28, 13 August 2008 (UTC)

Are there any guidelines, or best examples for conjugation and declension templates? I'd like to start working on templates for Ukrainian, but I need a starting point. Michael Z. 2008-08-20 19:04 z

I think that the templates for Polish may be quoted as good examples for declension and conjugation (only pl-conj-ai and pl-conj-ap and their (uncategorised?) shortcuts), at least for the needs of a structurally similar language such as Ukrainian. It seems that there are no guidelines other than the very sensible requirements in this and the following section of this page (mainly by Robert). -- Gauss 16:18, 23 September 2008 (UTC)
Thanks. The Polish example looks like a manageable illustration of the scope of the problem [yikes]. I suppose I'll start with the basic noun declension template first, and perhaps collect wisdom about authoring these templates at Wiktionary:Declension and conjugation templates or some such. Michael Z. 2008-09-23 18:22 z

Entries with table syntax rather than templatesEdit

split from the preceding section "Conjugation/declension tables for Russian"

  • Not directly related to this - but it would be great if someone could dump the list of all entries sorted by L2 language who use table syntax, for the regular editors to see which entries need clean-up and updating (or that list already exists somewhere?) --Ivan Štambuk 17:38, 14 August 2008 (UTC)
    • Remark to this: I observed that Turkish nouns will need a lot of attention at some point soon (probably more than I could afford anytime soon); they seem to be in a messy state. Quite a few declensions have been added (and subst:ed? or copy-pasted?), very obviously in very good faith. However, discrepancies start already by the number of cases listed and especially their order. Now I don't know anything about Turkish but if someone with at least minimal background could clarify if the figure of six cases (nominative, genitive, dative, accusative, ablative, locative) and their order (listed according to WP) are linguistically correct/standard/accepted ... reformatting could be tackled. Just in case someone feels like it ;-) -- Gauss 18:28, 14 August 2008 (UTC)
The Turkish declension at ev shows the best order. It’s the one I’ve been using and it’s what is used on Turkish Wiktionary. —Stephen 20:47, 14 August 2008 (UTC)
I've now checked all 1000-something entries in Category:Turkish nouns, and none of them is using table syntax any more without being tagged. Quite a few use {{crh-latin-noun}} (a template for Crimean Tatar which has no plural, or no singular) but that's a different cleanup task (where a bot could help a lot). Apart from spurious entries in Category:Turkish declension templates, the problems with Turkish are then mostly fixed or tagged. -- Gauss 17:38, 15 August 2008 (UTC)
    • I'm not completely sure of what you mean by all entries sorted by L2 language who use table syntax, but if you mean "find every entry that uses a subst:ed or copied-pasted table", bots should be the solution. As long as only the inflections were substitued (the table layout remained the same), this should be a pretty straightforward bot job, shouldn't it? Just recognize the table pattern, extract the inflections from it and substitute everything for a template call of the respective language that passes the inflected values as parameters. Now I just don't know who can/will help with this. I suppose there are some policies to use a bot... ~> SilvioRicardoC 20:43, 14 August 2008 (UTC)
Note that this sort of pattern matching isn't trivial; with parser conditionals it isn't just a matter of seeing if the static template text matches the page text. And further: you'd have to look at the version of the template at the time it was subst'd, and then it has to figure out if the current version is still valid (!). The automation would certainly be a lot harder (i.e. more work) than having people who know the language just add the correct template call and trash all the table text in the page. Given that there is an available template. Robert Ullmann 17:50, 15 August 2008 (UTC)
Not bots, but writing a program that will scan for explicit table syntax in XML dump of the database [1] and sort the entries by level-2 ==Language== section on some page for others to see. Technically more inclined folks here regularly do things like that.. (and this should be no problem for any decent CS/E student ;) That way we could see in which languages problem mostly lies, and to what extent bot-replacement is feasible. --Ivan Štambuk 09:09, 15 August 2008 (UTC)
He means something like this:

from XML dump, 13 June 2008:

language occurs examples
Ancient Greek 14 δύο μέγας τρεῖς τέσσαρες πολύς αὐτός ἐγώ σύ εἰμί ἔρχομαι γλαγάω εἷς ὅς
Arabic 2 كتب ليس
Aromanian 3 amu frate cãntu
Asturian 1 asturianu
Bosnian 4 biti rijeka govor Beograd
Bulgarian 2 звук вятър
Catalan 2 tu ser
Corsican 2 esse aver
Croatian 20 on one mi vi ti se ja ona ono oni sebe biti netko tko svoj moj nitko nešto ništa tvoj
Czech 5 muset studovat můj zůstat duna
Danish 1 hjul
Dutch 4 zijn hebben wezen regenen
English 2 be wit
Finnish 49 en et ei kuusi viisi seitsemän kahdeksan yhdeksän toinen -in -lainen -tar kaikki kumpikin kumpikaan kukaan älä jokainen moni
French 12 tu mourir protéger valoir moudre grêler ennuyer neiger y avoir maudire -oyer urger
German 30 sehen geben Feder sein sein ihr dein gehen Traum essen Ihr Polymer haben werden mein im können kein müssen dürfen sollen
Gothic 5 𐌱𐍂𐍉𐌸𐌰𐍂 𐍆𐌰𐌳𐌰𐍂 𐌰𐍄𐍄𐌰 𐌳𐌰𐌿𐌷𐍄𐌰𐍂 𐍄𐌰𐌲𐍂
Hindi 3 मैं हम होना
Hittite 1 𒀜𒋫𒀸
Hungarian 10 eszik iszik szeret megy tesz visz kér csinál lenni jön
Icelandic 5 jarl fjörður Krýning fursti Englendingur
Interlingua 1 tu
Irish 34 bog bean cam Éire beag folamh díreach milis géar glan salach mór ard deas bocht láidir ramhar nua óg bán dearg dubh
Istro-Romanian 2 io
Italian 960 dare empire essere tu fare vincere leggere esistere rivedere rompere sapere cadere chiudere stare piovere tenere bere io avere
Japanese 3 ある いい
Latin 108 dies is os os eo as do rumor unus tres duo tu nemo focus hippotoxotae sum arma vos calx odi alius pax hic elephantus
Latvian 1 suns
Lithuanian 6 tu mes ji jūs jis
Lower Sorbian 1 maś
Moldavian 2 фи хомосексуалитате
Norwegian 2 sur helvete
Occitan 1 anar
Old Church Slavonic 25 мати кость нога градъ бъіти домъ ва мѫжь жєна чловѣкъ дѹша пѫть отьць знамєньѥ срьдьцє азъ чьто тъ сєбє тъі мъі въі
Old English 4 lamb sæternesdæg þeos þis
Old Norse 3 bera hjalmr dyrr
Pali 1 jānāti
Polish 44 Albania Macedonia stół neon tydzień Polska ja kot ptak nóż serwer sufit szpital cis einstein Islandia ucho Słowacja delfin Stany Zjednoczone
Romanian 72 Catalonia el anterior care biet adult eu tu ea noi voi voi ei ele America orb sec tot diacritic olog speria alt meu
Romansch 3 esser pudair murir
Russian 190 весь дом мой щенок чай глаз шесть монгол быть Христос дерево лес давать брать они дать мир твой ваш наш свой дверь
Sanskrit 2 अस्मद् युष्मद्
Scottish Gaelic 103 bog bi fear saor math bard cam beag beatha fada milis glan salach peann obair dearg donn glas ruadh bàn nuadh òg
Serbian 7 biti река reka говор govor Beograd бити
Slovak 2 byť človek
Slovene 5 dan biti mleko tresti rožmarin
Spanish 1 irse
Swedish 101 hot homonym civil bred blond nit mark start gym historia kam dass regn vara vara vara vara professionalism många ark ark
Tagalog 1 ako
Tocharian A 1 käṣṣi
Turkish 48 araba ana Londra ev anne armut Atina Moskova abajur gelmek gitmek kurbağa kurabiye abla hastane kişi Eyfel Kulesi turşu aynştaynyum
Ukrainian 5 горілка око мати батяр ковбаса
Welsh 1 bod
you know, if someone were to write a program to scan the XML dump. Maybe it might look like that? Robert Ullmann 17:22, 15 August 2008 (UTC)
Thank you! Of course this list lists some "false positives" like ucho where the table syntax was intentional to distinguish declension patterns by meaning (same etymology, so it can't be split further up). -- Gauss 17:38, 15 August 2008 (UTC)
Of course it is very much out of date; I noted while randomly checking that you (and others) have fixed a number or them. Now, if we could just nominate smascherino for WOTD. (but no "foreign" words, sadly) Robert Ullmann 18:14, 15 August 2008 (UTC)
Robert, could you generate a page (in my user space) listing all 108 Latin words? Some of these are one-offs, where the inflection pattern is unique to the word, but there are many of these likely to be old additions that need cleanup. --EncycloPetey 19:09, 15 August 2008 (UTC)
Sure, User:EncycloPetey/Latin entries with tables Robert Ullmann 16:07, 16 August 2008 (UTC)

WAP (or other PDA) access to WiktionaryEdit

I was using en.wiktionary.7val for some time to access Wiktionary from my Palm Treo 680.

It was very good, but it seems to have been discontinued.

Does anyone know of a good WAP (or other PDA) interface to Wiktionary?

dda —This unsigned comment was added by Dda (talkcontribs).

I regularly use the standard on my XDA over a GRPS connection without any problems (other than the lack of proper UTF-8 support in the version of Internet Explorer I have to use). Thryduulf 00:01, 12 August 2008 (UTC)

After my initial append (above), I contacted the folks at 7val. is now fixed and working again.

And to Thryduulf... I guess you may have very cheap data rates, but I pay WAY TO MUCH for data on my phone. Therefore the MUCH smaller pages of and really mean a great deal to me. My phone can handle the full-blown pages just fine... but I don't want to pay for all the extra "guff" that really adds nothing to the experience. ALSO... on a small phone screen, I find the 7val pages much easier to read and navigate. I guess we can just say that we are all very fortunate to have a CHOICE !!!!!!!!!! dda

Use . Conrad.Irwin 08:20, 12 August 2008 (UTC)

RC header linksEdit

Gratuitous changes to the links in the RecentChanges page heading in the WM software in the last day or so have broken a number of things, I've tried to keep up. (Of course, there is no warning whatsoever of such things, they are features we should appreciate?) For example, if you have bot edits shown, and click on a number to show, it (now) loses the hidebots flag. The from= (show edits from) is also now broken.

This has broken some of the autopatrol code, and several other things. It is 4AM here, I have fixed what I can, but things keep changing. I will try again when I wake up! Robert Ullmann 01:09, 12 August 2008 (UTC)

Someone was trying to simplify something? Or just restucture code to no point? I don't know; in any case after 2-3 changes over several days, it has arrived back almost where it started. Not quite, to no purpose I can see. Anyway, some automatic patrolling has been broken, if you are/were running Rat Patrol, you need a code fix, replace:
   rclast = re.compile(r'starting from.*?from=(\d*)&amp')


   rclast = re.compile(r'starting from.*?from=(\d*)')

and it will work again. Robert Ullmann 13:16, 12 August 2008 (UTC)

iwiki linksEdit

Since we have gone an extremely long time without a dump: I am running Interwicket entirely from the same dump, but the current status of the union index of all wikts.

I seriously do not understand what the problem is; Brion said they were waiting on more file servers, but this is a ~1 hour process; even the en.wp all-current dump takes only 1/2 day. There is no reason this can't be done once a week?

Anyway, Interwicket will run (slowly), and perhaps Tbot too. Robert Ullmann 00:55, 16 August 2008 (UTC)

Update: Not running tbot; it is not very effective without a current dump, does a lot of network reads for no useful result. Interwicket is about 65% effective (i.e. ~35% of the reads find entries that were already done), so I am running two threads, about 100 updates/hour. Robert Ullmann 15:13, 18 August 2008 (UTC)

not running someone gratuitously broke Special:Allpages at about 03:00 UTC this morning. (Would it be too much to ask developers to work on the 1500 or so outstanding bugs instead of inventing things to change? *sigh*). Specifically, from=xxx is now broken, lists the all pages page index from that point instead of pages from that point. Robert Ullmann 04:46, 20 August 2008 (UTC)

fixed by changing my copy of the wikipedia python framework to use the API for allpages ... Robert Ullmann 09:06, 20 August 2008 (UTC)

Renaming usersEdit

Two users (User:Computer and User:Cool Cat) have requested renames. Currently these pages redirect to the new names, and I get an error saying that the old users do not exist when I attempt the renames. Does this mean they have been done? Does a rename simply move the pages? — Paul G 09:37, 19 August 2008 (UTC)

  • Dvortygirl has already done these. User and user talk pages get moved as part of the process. In this particular case they are a mess, but I don't think the user has any intention of contributing anything. SemperBlotto 10:05, 19 August 2008 (UTC)

I protest the existence of the "sherbert" wiki!Edit

Sherbert is simply not a word!!!!!!!!

Sherbet is a word and in America many people mispronounce the word as if there were an are like "Sure, Burt." The Oxford American Dictionary says clearly "Frequency of misuse has not changed the fact that the spelling sherbert and the pronunciation |ˈ sh ərbərt|are wrong and should not be considered acceptable variants." I don't want to dispute it with them. Sadly, Mariam-Webster has not the same standards and lists the word "sherbert" as a variant of sherbet. However, because treats the entire English speaking community, and only some American dictionaries consider "sherbert" to be a word, and no Oxford dictionary considers it a word (is not the OED the most important dictionary, too?), this wiki should not exist.

This wiki does not even site their source, and yet the allow incorrect English to continue by even considering "sherbert" a word. It's "sherbet" and pronounced "shərbit."

Can I petition the removal of this word from wiktionary please?


First off, en.wikt is a descriptionist dictionary, meaning we don't dictate usage we just catalog it. So when a high percentage of uses are spell or pronounced differently we general note it. If you still feel that sherbert should be removed you may nominate it Request for Deletion. --Bequw¢τ 19:50, 19 August 2008 (UTC)
I have used the word sherbert for about 60 years, and was amazed that the OED doesn't reflect reality. It does have these quotes . . .
  • Under sherbet "1675 COVEL in Early Voy Levant (Hakl. Soc.) 263 Your little sherbert cups and coffee dishes are made often times of the same earth."
  • Under float "A soft drink with a scoop of ice-cream (or sherbert, etc.) floating in it."
  • Under gup "1942 S. HOPE Sea Breezes 36 With little to do except drink sherbert and listen to the gupgossip, which, incidentally, they didn't understand. "
  • Under mocktail "In 12 to 14 champagne or sherbert glasses, arrange fruit..."

SemperBlotto 09:51, 20 August 2008 (UTC)


Could someone with the know-how please add Tifinagh script to the MediaWiki:Edittools..? Ƿidsiþ 07:50, 20 August 2008 (UTC)

flood flagEdit

Meta is considering adding a "flood flag" capability. In short, this would allow any admin to mark any individual edit as "bot", hiding it from RC. We can, if we wish, do the same, by achieving consensus and begging a developer. WF is one reason not to have such a thing, I suppose. I can't think of another. It would help with old deletions, such as Robert Ullmann's periodic ones. I'm not sure it'd be a good thing, but it's definitely (imo) worth discussion.—msh210 20:54, 20 August 2008 (UTC)

I can't see that this would be all that useful. Most editors likely to "flood" RC already us a bot to do these edits (e.g. Semper and Robert). The only other person who floods RC is Jyril, and seeing those edits are a nice spur to do more. So, is there any real benefit to having this capability? --EncycloPetey 21:03, 20 August 2008 (UTC)
It would not be useful for edits, I agree. It would be useful for deletions and other admin-only tasks that show up on RC (certain moves, for example; but I'm especially thinking of RU's deletions). It would, in short, be a way of allowing an admin to flag himself temporarily as a bot but still be an admin.—msh210 21:15, 20 August 2008 (UTC)
My Italian bot sometimes adds a block of wrong words (operator error) and I have to delete them all. In this case I have always wanted to mark the deletions as minor - to prevent flooding the log. I suppose marking them as a bot would make more sense. SemperBlotto 21:26, 20 August 2008 (UTC)
Hm, actually, what this proposal (on Meta) would do is make a new group, "flood", and allow admins to toggle whether why're in it. Why not just allow admins to toggle admission to the existing "bot" group? Wouldn't that be simpler? Is there a downside to it (relative to the route Meta is taking)?—msh210 21:33, 20 August 2008 (UTC)
Given that some deletions are controversial, I can see a downside to allowing admins to declare themselves a bot... particularly since WF has been made admin not once, but three times. --EncycloPetey 21:35, 20 August 2008 (UTC)
But wouldn't the same downside exist for allowing admins to declare themselves a "flood" (i.e., hidden from RC)? I'm not seeing a difference.—msh210 21:39, 20 August 2008 (UTC)
It would indeed. I don't see any benefit to allowing either declaration, but am seeing downsides. --EncycloPetey 21:40, 20 August 2008 (UTC)
Seems to me this is just a large set of problems waiting to happen. All that is wanted here is "better" filtering of RC, right? Someone added a bug request for a "Hide logs" link in RC, which was shot down as "feature creep" ... it would have accomplished this much more easily. (Has anyone thought about the effect of "flood" on patrolling, etc? Would we need a "show flood" link in RC? ;-) Keep in mind that a block of deletes has to appear in the delete log anyway. If someone has enhanced RC on, it will show as only one line. See no benefit here. I would be surprised if the developers considered it, but then they surprise me all the time. In any event, we should not use it IMO. Robert Ullmann 04:31, 21 August 2008 (UTC)

Broken TOC in TopicsEdit

The call to {{categoryTOC-lc}} on the page for Category:*Topics is not displaying properly. I do not know why, since the template itself looks fine, and I'm not familiar with a couple of the items used in the template. --EncycloPetey 05:17, 22 August 2008 (UTC)

Someone has "helpfully" broken PAGENAME in the same way as #switch, so that what it returns is treated as the start of a wikitext line, and parsed. The "*" then creates a list item in the middle of the URL. Robert Ullmann 05:47, 22 August 2008 (UTC)
worked around Robert Ullmann 06:00, 22 August 2008 (UTC)


I recently added tistis to en.wikt, which is a Seneca word for woodpecker. The language has ISO code {{see}}, which is of course already taken and is much used. I guess eventually {{see}} will have to be renamed, or maybe merged into {{xsee}}. --Jackofclubs 10:47, 24 August 2008 (UTC)

{{xsee}} works in a slightly different way, (I proposed that they be merged once, for a similar reason, but that did not happen). It seems that we will need to do something about this at some point, and that the current set-up relies on having {{see}} reading Seneca. We have two choices, either change the myriads of templates that use the language-code templates, perhaps to use a prefix such as lang: before the language name. This shouldn't be too obvious to the editors as the language templates should never be used directly anyway. This may be something we want to do anyway, as reserving all short template names confuses those who expect {{tl}} and {{bot}} to work as on Wikipedia (it would also make Special:AllPages/Template: look a bit tidier :). The other solution I can think of is to bot-replace all 42,000 occurances of {{see}} with {{seealso}}. This would take a day or two of intense bot-work. My preferred solution is to bot-replace all occurrences of Template:see soon, and leave a note on the template page explaining the change. Are there any difficulties I have overlooked? Conrad.Irwin 12:09, 24 August 2008 (UTC)
First of all, please note that "language templates should never be used directly" is false: the primary purpose of the templates is in translation tables, where they are to be subst'd, either by the user adding them or by AF. A very large majority of the other wikts use the 2 and 3 letter templates this way, which is why we do the same, and will certainly continue to.
It doesn't have to be that hard. We can use {{iro-sen}} or such for Seneca if we want to (for now). We don't have to worry about iwiki links.
{{see also}} already redirects to {{see}}, this redirect could be inverted. {{xsee}} could be merged in, but is a non-problem.
The last time(s) this was raised the conclusion was to wait until we (someday? finally?) got the Did you mean extension that has been promised forever; which then would mean we could remove most instances of {see}.
There is utterly no urgency, except for [never mind]. We can use iro-sen or something starting right now if we desperately need to code Seneca. Robert Ullmann 12:22, 24 August 2008 (UTC)
One more detail: even if we were to bot-convert "see" to "see also" today, the redirect would have to remain in place for 6-12 months until most editors learn not to use "see" any more; then it could be re-purposed. Otherwise you will drive many people seriously nuts trying to understand why "Seneca" is turning up at the top of the page they are editing. Robert Ullmann 12:33, 24 August 2008 (UTC)
Well, seeing as we have two quite capable bot writers, who have both already completed a task like this, would one of you start attacking {{see}}? I think six to twelve months a bit much. That's something like two or three generations here on Wiktionary. Maybe we could wait a month or two. -Atelaes λάλει ἐμοί 18:28, 24 August 2008 (UTC)
It has nothing to do with how long it takes to do bot-work. It has to do with the simple fact that editors who aren't here every day are going to be turning up over a fairly long period of time, expecting {see} to work. That is, not produce "Seneca". So {see} has to remain a redirect to {see also} (or whatever) for a significant length of time.
Also note that the last time I suggested replacing {see} with {see also} there were objections; I'd suggest waiting to see what others have to say. Adding a convert rule to AF to munch through them, both old and the new ones people will add, is pretty trivial. (And I have another sneaky fix in mind that we could use once most are done.) Robert Ullmann 18:37, 24 August 2008 (UTC)
What about using the new template name {{also}} in place of {{see}}? This would still be intuitive as a name, but would keep the name short and not require remembering whether a space is needed in the template call. --EncycloPetey 00:06, 25 August 2008 (UTC)
I like that. Michael Z. 2008-08-25 06:25 z
Agree. A bot will probably be needed to make the switch, though. Even outright adding instruction to replace it in AutoFormat (because it'll take a while for everybody to really notice, I'd assume) might be a good idea. Circeus 14:06, 25 August 2008 (UTC)
+1, brilliant. —RuakhTALK 14:47, 25 August 2008 (UTC)
I like it. --Bequw¢τ 02:37, 26 August 2008 (UTC)
(my initial reaction to {also} was "yuck", but I know enough to wait a while ;-) Yes, that is pretty good, esp. since one can perfectly well say "also this and that", without the word "see". So it reads well in the wikitext: {{also|Cat|CAT|cāt}} ... we should find out what Hippietrail thinks, he commented before; someone has already left him a note. Robert Ullmann 18:44, 26 August 2008 (UTC)

Process: If we want to convert from {see} to {also}, I'd suggest doing the following:

  • make {also} a redirect to {see}
  • give AF a rule to convert {see} to {also} (and {{see also}} to {also})
  • (forget about it for a while ;-)
  • reverse the redirect (when something like 1/2 done)
  • let AF sort the rest while also handling new additions of {see}
  • somewhere in here, change {see} from a redirect to a bit of magic that returns "Seneca" with no params, and invokes {also} with params (!) we don't want to do this at first, lot of overhead (note that AF will only convert instances with parameters)
  • start using code see for Seneca
  • eventually reduce it to the language template, so that subst'ing it won't make a mess

Along the way, inform people that they should be using {also} instead. But do note that what people see in entries is the most effective way to do this. Robert Ullmann 18:44, 26 August 2008 (UTC)

I am trying this out; created the redirect from {{also}}, and converting some. You can look at the result. (Most amusingly at free which is the first entry in the wikt (mainspace, oldid=23 ;-). Can always go somewhere else. Robert Ullmann 01:52, 27 August 2008 (UTC)
I like the logic of this migration process. So, it wouldn't much matter when a contributor were to switch from using {{see}} to {{also}} as long as it was before the introduction of the magic version of "see". Even then the cost would be limited to the overhead. OTOH, if a user started using "also" it would provide more instances for others to observe and ask about. Cool. DCDuring TALK 02:13, 27 August 2008 (UTC)
It looks like a sound plan to me, but we should also be sure to update style guides to reflect the change. I've updated Wiktionary:About Latin and Wiktionary:About Hungarian to reflect the change in template name, and Msh210 has done Help:Internal links. --EncycloPetey 03:04, 28 August 2008 (UTC)
Note: AF also has rules for {{see also}} and {{See}} (both redirects to see) Robert Ullmann 12:00, 27 August 2008 (UTC)
Would it be possible to add an sc= capability to {{also}}? At least with grc, a good font is especially important for seeing the difference between the words (which are often only different by some minuscule diacritic). -Atelaes λάλει ἐμοί 02:30, 4 September 2008 (UTC)
Would apply to all the titles, which isn't desirable; often the entire point is that they are in different scripts. Use {{xsee}} and wrap the ones you want with the font template you want. (after linking them) We could merge this into see/also with use of {{wlink}} at some point. (we should probably have {{xalso}} as a redirect to save someone confusion?) Robert Ullmann 12:48, 4 September 2008 (UTC)

missing * before IPAEdit

While working on User:Robert Ullmann/Pronunciation exceptions at the suggestion of Thryduulf, I've discovered a large (12-13K) set of entries that have just the IPA pronunciation, with no * at the start of the line. A lot of them are Romanian entries that Ric added, which do have lang=ro. Quite a few others are French.

I'm teaching AF to add the * if it can match the whole line, and add the appropriate lang= in this case. (not the general case of IPA templates in FL pronunciation sections yet). Robert Ullmann 09:32, 26 August 2008 (UTC)

Ah, here's an oddity: if you use {IPA} with no lang=, or lang=en, it generates a link to w:IPA_chart_for_English, but if you give it a language code not in the small set it "knows", it links to w:English phonology. Which is a bit confusing? I can fix it, but where should it link? Robert Ullmann 16:41, 26 August 2008 (UTC)

w:International Phonetic Alphabet would seem to be a logical destination. Thryduulf 23:48, 26 August 2008 (UTC)
Alternatively, there is Wiktionary:IPA pronunciation key. The advantage is that its less wordy than the Wikipedia article. The disadvantage is that it lacks explanatory text altogether. --EncycloPetey 03:02, 28 August 2008 (UTC)
I've added a two sentence introduction to that page. It needs more work though, as for example the ɹ symbol is completely missing from the chart. Thryduulf 12:29, 28 August 2008 (UTC)
I would think our pron key would be the best, it can link WP etc. Robert Ullmann 15:45, 28 August 2008 (UTC)
Sounds good to me. Personally, I would prefer a link to w:lang phonology. In my experience, these often exist, although admittedly not always. -Atelaes λάλει ἐμοί 19:05, 28 August 2008 (UTC)
As I understand it -
  • If "IPA chart for lang" exists it will link there.
  • If the above doesn't exist and "lang phonology" does, it will link to that.
  • If neither exist it needs somewhere to link to. Currently this is w:English phonology, which isn't appropriate, and so the discussion above is to chose a better target. Thryduulf 01:15, 29 August 2008 (UTC)
not exactly; it does this:
  • if the language is English (default) it links to IPA chart for English (in the wikt)
  • if the language is something else in a specific list in the template, it links to w:(language) phonology — note that it can't test for the existence of the WP entry
  • if the langusge isn't in the list, it links to w:English phonology (which is the issue I raised)
we could always link to w:(language) phonology, but we'd have to add some redirects to the 'pedia, and have some way of checking or maintaining them (w/o adding yet another maintenance task that will go on forever ;-) Keeping the list in the template for now is probably fine. (I will improve how it is done ;-) Robert Ullmann 11:40, 29 August 2008 (UTC)

Should translingual pronunciation sections (e.g. that at ) link to the same place? What should the lang= parameter contain for these entries? Thryduulf 23:08, 29 August 2008 (UTC)

I don't think that entry should have a translingual section at all. This is a letter, used by a certain set of languages (e.g. Sanskrit, Marathi, Hindi, etc.). There is no pronunciation in a translingual context, only in a language specific context. -Atelaes λάλει ἐμοί 23:53, 29 August 2008 (UTC)

I'm really thinking we should not rely on this list. I just discovered that grc does not work in {{IPA}}, and so I was thinking of adding it, but then realized what kind of queue that would create. It seems a little silly to do this every time we want to add another language. I think we should simply have it link to [[w:lang phonology]] every time there's a language inputted. Granted, wikipedia does not have a phonology article for every language, but I have to imagine that they intend to, and will, in time. For those languages which don't have articles, we could probably just go over there and create redirects to the main language article, perhaps to the section about phonology, if it exists (and make sure to wash your hands afterwards, before editing Wiktionary again). -Atelaes λάλει ἐμοί 01:37, 30 August 2008 (UTC)

Agreed. If we're worried, we can bring this up at their equivalent of the beer parlour (one of the village pumps, I think) and confirm that they're O.K. with standardizing on [[<language name> language]] and [[<language name> phonology]] as article names (or at least, on offering redirects from those names). —RuakhTALK 02:41, 30 August 2008 (UTC)
From what I understand about redirects, this is a non-issue; there will be no problem creating a redirect where there is no (language) phonology page; it would be encouraged. The issue is somehow checking that we have the ones we need, i.e. have IPA templates that point to them. Robert Ullmann 15:12, 31 August 2008 (UTC)

Other rulesEdit

I've added some more rules to AF to handle cases turning up in the exception report. I was going to test them in AWB first, but AWB is not working any more.

Rules are converting RP, UK, and US to the {a} versions in some cases, also some things like "Zhangzhou". And one special rule: there are a large number of entries which look like this:

* foo, /fu/, /<tt>fu</tt>/

which it will convert to enPR, IPA, SAMPA; I've looked at quite a number of them and they are very consistant. Robert Ullmann 15:45, 28 August 2008 (UTC)

This same task would be usefully applied to the Rhymes namespace also - see Rhymes:English:-ɜː(r)l for example. Thryduulf 14:00, 30 August 2008 (UTC)
Erk... I just noticed an example of this format in an Italian section (sei#Italian). When this format appears in a non-Enlgish section, the first portion is not enPR, since that is English-specific. In particular, Italian entries show the "dictionary accented" form that Italian print dictionaries use. This is (again), not enPR, so AF will need to be certain not to consider it so when cleaning up FL Pronunciation sections. --EncycloPetey 00:18, 1 September 2008 (UTC)
Does the dictionary accented form give enough information to be considered a pronunciation guide? Should we create {{itPR}} for it? —RuakhTALK 00:31, 1 September 2008 (UTC)
Not really. All it does is place an accent on the word at the place of stress. That's all. --EncycloPetey 00:33, 1 September 2008 (UTC)
Changed to only fire rules containing "enPR" when section lcode = "en". I've been looking at most of these while doing my usual monitoring of AF, and haven't seen it convert any previously. At some point I should be looking for the general case of enPR outside English sections. (If we ever get an XML dump?) Ah, it has dozen or so of these while I was sleeping; have fixed them will review more in a little bit. Robert Ullmann 14:35, 1 September 2008 (UTC)

Templating rhymesEdit

Also from the pronunciation formatting work, Thryduulf suggests AF could convert un-templated instances:

Rhymes: *\[\[Rhymes:English:-(?P<s>.+?)\|-(?P=s)\]\] → {{rhymes|\1}}

I.e. if the two instances of the IPA suffix match, then replace with the template. This just does English, but it isn't clear that there are any FL language entries that don't use the template. If there are, we'll find them a bit later.

Do read User talk:Robert Ullmann/Pronunciation exceptions if any of this interests you. Robert Ullmann 19:00, 26 August 2008 (UTC)

The coding above is missing an initial asterisk in the output (if I'm reading it correctly). It might also be worth checking to see that the target page exists. I've come across a number of rhymes page calls where the target did not exist. It would be nice to track those, as sometimes it is the result of incorrect IPA characters and sometimes the result of something more seriously wrong. --EncycloPetey 19:06, 26 August 2008 (UTC)
There's no asterisk missing, no. (The above isn't matching the entire line; it's just templatizing the part of the line that it does match. So, it won't touch initial asterisks/bullets, colons/indents, etc.) As for target-page existence, that sounds like a somewhat separate task — maybe it could use the XML dumps to generate something like Special:WantedPages for the Rhymes namespace? —RuakhTALK 19:51, 26 August 2008 (UTC)
yes, this doesn't affect the stackable wikitext formatting preceding it. See December et al. Robert Ullmann 01:55, 27 August 2008 (UTC)
There is quite a bit a lot of work needs to be done on the Rhymes: namespace, as there are literally thousands of words that rhyme that are not included in the rhymes pages. A while back (1-2 months ago?) someone suggested converting the manual rhymes pages we currently have into something that works like categories do, so that every time at word is noted as rhyming with, e.g. -ɒp it gets added to the list of rhymes at that page. Although how the syllable split would be managed I don't know. I don't remember that this discussion ever came to any conclusions (or where it was).
Another problem is that our IPA transcriptions use [ɹ] for the sound represented by the English letter "r" (e.g. car is /kɑː(ɹ)/) but the Rhymes pages use [r] (e.g. Rhymes: -ɑː(r)). Changing this would require significant bot work.
As noted above, there are also many rhymes pages that don't yet exist. The templatising of the calls to the rhymes namespace will be a useful first step to sorting it all out imho. Thryduulf 23:44, 26 August 2008 (UTC)
with these "templatized", we can then gen up a cat with the missing pages. Robert Ullmann 01:55, 27 August 2008 (UTC)

Also related to the Rhymes, a lot of the Rhyme lines in pronunciation sections were indented with *: rather than the * of the pronunciation transcriptions. The current standard seems to be that this isn't done. Unless anyone objects, I'd suggest that AF removes the : indent from these lines where it finds them. However as this is such a trivial change, I'd set it as one of AF's minor changes so that it doesn't bother saving the edit just for this. Thryduulf 14:09, 27 August 2008 (UTC)

As I understand it, the original rationale for the indenting was that each rhymes link would be tied to a particular pronunciation and indented under it. However, with the current format of the Pronunciation sections this makes no sense, since our pronunciations are grouped geographically (with more than one pronunciation on a line), and since the rhymes are all keyed and named only using the UK phonetic transcriptions. I agree that the colon ought to be removed, and we might see about changing this in WT:ELE as well. --EncycloPetey 02:59, 28 August 2008 (UTC)
Both WT:ELE and WT:PRON say the indent should be :* which is wrong, it breaks the first level unordered list. (Yes, this only affects someone reading the XHTML, but that is done. Javascript or whatever reading the structure should only see one unordered list.) The correct way to indent a bullet within a list is **. Since indents have been used with semantic meaning (as EP notes), I'm a bit reluctant to just remove them automatically, at least until AF is taught more about the entire section, rather than a small set of ad hoc rules. As we munch through the problems list, we can get into more details, eventually flagging any Pron section that cannot be parsed. (I am saying I'm reluctant, but if it really looks okay to do somewhat blindly, okay; I would convert ":* {{rhymes" and "*: {{rhymes" to "* {{rhymes") Robert Ullmann 10:43, 28 August 2008 (UTC)
... and since the rhymes are all keyed and named only using the UK phonetic transcriptions. Um, where did this idea come from? I doubt that kind of UK-centricity would pass. And it hardly addresses other languages. Robert Ullmann 15:15, 31 August 2008 (UTC)
I'm guessing that since a lot of pronunciation differences between the UK and US are predictable (e.g. [ɒ] -> [ɑ]) it was felt that there wasn't any point in maintaining different Rhymes pages for them. Where there are very different pronunciations (either between regions, or between different parts of speech) I've been adding two rhymes templates followed by {{qualifier}} templates. As mentioned in several places elsewhere the Rhymes namespace needs a lot of work (in lots of different respects), and so I'm tempted to leave the status quo as is for the moment and then deal with all the issues in one fell swoop (perhaps we should make a list of the issues somewhere). Thryduulf 10:49, 1 September 2008 (UTC)

Lupin popupsEdit

Did something break this? In the last few days they haven't been able to load anything besides the title of the page, at least for me. Nadando 02:40, 28 August 2008 (UTC)

w:Wikipedia:Tools/Navigation popups#Wikipedians who have helped lists, among others, “Yurik - with his fantastic mediawiki BotQuery extension”. Said extension was disabled this past Monday (see [[#The API is dead! Long live the API!]]); apparently Lupin popups didn't survive. —RuakhTALK 02:50, 28 August 2008 (UTC)
Oh, too bad. Made patrolling so much easier. Nadando 02:58, 28 August 2008 (UTC)
It isn't hard to convert to api.php, I've done various things over the last months (from UI calls to API, not from QUERY, but is similar) Lupin is widely enough used on the 'pedia that I expect someone will fix it, if not already. We can then steal. Robert Ullmann 10:49, 28 August 2008 (UTC)
Well, prior, the only that would ever load was the (inexistent) intro anyway, as I commented here. Circeus 18:14, 29 August 2008 (UTC)
The only function I'm currently missing is getting counts of the number of items in categories beyond the first 10. This slows me down on working cleanup lists that often have more than ten refractory items that have taken up semi-permanent residence on said lists. That I still have other functions could be the result of not having flushed a cache I suppose. DCDuring TALK 19:26, 29 August 2008 (UTC)
Re: flushed cache: no, the APIs are strictly server-side. So, it probably means that the functions are being restored and Lupin popups is being converted from using query.php to using api.php. (Unfortunately, I don't use it, and don't know anything about it, so can't give you more information than that.) —RuakhTALK 19:32, 29 August 2008 (UTC)
My popups is working fine... if you've still got problems you might want to try copying these lines from my monobook.js - which import an old version of popups. Conrad.Irwin 00:30, 2 September 2008 (UTC)
importScript('User:Connel MacKenzie/mess-with-popups.js');
The one bit of functionality change that I've noticed that persists after inserting the above code is that the count of items in categories is gone if there are more than ten items in the category. DCDuring TALK 03:17, 2 September 2008 (UTC)