Wiktionary:Grease pit/2008/September

Grease pit archives edit

2024

2023

Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

October

November

December

monobooks.js - can it do this?

Is it possible to automatically expand templates that are collapsed by default? I'm talking in particular about the various Spanish conjugation templates: Template:es-conj-ar, Template:es-conj-egir, Template:es-conj-car, etc, which may all be found at Category:Spanish conjugation templates. By default they're collapsed with a button to show them on the right. I can see how this would be helpful for some people, but I would rather have them expanded automatically. Is there a .js or .css script available to do this? Thanks, FlamingSilmaril 14:26, 1 September 2008 (UTC)[reply]

Yes- go to WT:PREFS and check "Alternatively, leave ALL translation sections expanded (and similar hidden sections) - Default is to leave them collapsed". Nadando 16:56, 1 September 2008 (UTC)[reply]

Great, thanks! FlamingSilmaril 18:45, 1 September 2008 (UTC)[reply]

Perhaps WT:PREFS should include a note on how to rig your monobook.js so that the chosen pref travels with you from computer to computer.—msh210℠ 21:53, 2 September 2008 (UTC)[reply]

Template:hy-noun

Opiaterein has disabled the tr= function of this template, making the transcription the unnnamed parameter {{{1}}} instead. How strongly do we care about using tr= to consistently mark transliterations in non-Laitn script templates? --EncycloPetey 17:15, 2 September 2008 (UTC)[reply]

Fairly strongly. Certainly if a template already uses tr= for the transliteration, breaking it is unacceptable. (From the note on your talk page, Ric seems to be under the mistaken impression that {1} can't be used for plural form, etc, while leaving tr= as is? Does he not understand that named parameters aren't counted? That in, e.g. {{hy-noun|plur|tr=trans}} that "plur" is {1}? And do note the edit summary please.) In any case, should be put back and extended properly. Robert Ullmann 17:30, 2 September 2008 (UTC)[reply]

He understands, but as he notes on his talk page, he prefers the unnumbered linear style. --EncycloPetey 17:37, 2 September 2008 (UTC)[reply]

'kay. Just another little mess someone will clean up someday. Robert Ullmann 16:07, 5 September 2008 (UTC)[reply]

Template:sense

Circeus has added a space to the end of this template. I don't see why anyone using the template wouldn't follow it with a space anyway. This now means that we get two spaces after the colon (if the template is set up to produce a colon) instead of one; it also suggests that he/she has used it on at least one page and not explicitly followed it with a space, meaning that if we roll this back, we will have missing spaces on those pages. — Paul G 11:28, 5 September 2008 (UTC)[reply]

I would think it would be good to place a plain space after the colon to always ensure a space after it.

A non-breaking space is problematic, because it will sometimes create a double space where there is another space after the template. This will give an inconsistent presentation, and modern typography uses single spaces after colons anyway (double spaces are a typewriting convention, and a characteristic of Victorian typography). —Michael Z. 2008-09-05 14:11 z

I see no reason to include a space at the end of this template. We don't do it to any other template, and any sane person would put a space between the colon displayed at the end of this template and any folowing words. --EncycloPetey 15:42, 5 September 2008 (UTC)[reply]

He was editing angle bracket, added a sense usage without a space, then added the space to the template instead of the entry? Makes little sense to me ;-) There isn't any reason for this; I'm just going to put it back the way it was. Right now, all of our other pages that use {sense} are displaying extra space. (He subsequently added a uses of {sense} to debacle w/o depending on the space ...) Robert Ullmann 16:04, 5 September 2008 (UTC)[reply]

There should be one plain space following the colon in the template. In HTML, multiple whitespace characters in the code get rendered as a single space on the page. So even if one editor ever forgets to add a space once, this will still display correctly, instead of having subsequent text set flush with the colon (which is always wrong in English). In all other cases it will have no effect. —Michael Z. 2008-09-05 16:57 z

No. It creates the opposite problem: editors leaving out the desired space in the wikitext because it "isn't needed" or they don't notice, or (worst) "it doesn't belong there, it is in the template!". Much more severe problem than the occasional omitted space in the text. Robert Ullmann 17:39, 5 September 2008 (UTC)[reply]

I don't see what the problem is—searching in the wikitext for terms which follow the template?

Following the programmer's dictum of being liberal in what input to accept and conservative in what to output, the template may as well prevent that occasional omitted space from ever occurring.

We don't do it for other templates, but other templates don't end with a colon. Come to think of it, offhand I can think of no reason not to add a plain space to context templates and all others in round brackets. —Michael Z. 2008-09-07 20:09 z

Normally, I'd make a case for adding a space to the {{context}} derivatives too. However, I think (it would certainly explain why I so rarely runs into it in articles) it is amongst the AutoFormat fixes (and if it isn't it should be easy to add should it be felt a good idea), hence I would tend to agree it is not strictly needed. This was sort of a reverse reflex from Wikipedia, where I am constantly editing tample to remove extraneous whitespace. Circeus 03:38, 8 September 2008 (UTC)[reply]

Special:WantedPages

I think it is time that this page was regenerated. A lot has happened here since September 2007, and I for one would like to see a new list there. I understand that there were some performance issues with having this done on a regular basis, but a one-time regeneration would be sufficient for several months. – Krun 16:26, 5 September 2008 (UTC)[reply]

Unfortunately, we haven't had an XML dump for many months now (June), so anything generated would already be three months out-of-date. If you can add your voice to those clamoring for an XML dump, it might help push that into happening. --EncycloPetey 18:04, 5 September 2008 (UTC)[reply]

Different things. The Special page is generated from the SQL database; the same task queue generates the others (which, note, have been updated twice a week all along). But trying to get it re-enabled will bump you into the same swamped couple of people who can't get the XML dumps going ... (;-) Robert Ullmann 18:41, 5 September 2008 (UTC)[reply]

Without a new XML dump, even a workaround like User:Connel_MacKenzie/Wantedpages cannot be regenerated. -- Gauss 11:53, 7 September 2008 (UTC)[reply]

Relatedly: Wiktionary:Most_missed_articles. Conrad.Irwin 12:00, 7 September 2008 (UTC)[reply]

Looking at this last, many seem to be poorly selected, obsolete, and untested links coming from WP. Would there be an easy way to generate a list of bad WP links to Wiktionary? Is there an easy way to do this at WP? I haven't looked there yet. DCDuring TALK 12:58, 7 September 2008 (UTC)[reply]

order of templates

How would people feel about changing the order of the templates list during an edit so comparative comes before superlative? RJFJR 00:30, 6 September 2008 (UTC)[reply]

That makes loads of sense. Done. —Ruakh_TALK 01:19, 6 September 2008 (UTC)[reply]

Uncategorized pages strangeness

Special:UncategorizedPages includes the Italian noun incidente stradale. This is in Category:Italian nouns, and has been since the entry was created on 31 August. Any ideas? SemperBlotto 10:21, 7 September 2008 (UTC)[reply]

p.s. Also scioperi della fame

These two entries don't appear in their respective categories at present. Looks like the cat membership didn't get updated on the save. (I'm noting more and more of this from the WM s/w, not being able to update cats on saves and thing like template edits, I think there is a serious bug out there.) If we purge these two pages, I suspect they will show up in the cats just fine, and not be in the next update of the special page. Robert Ullmann 12:51, 7 September 2008 (UTC)[reply]

converting "old" ety templates to Template:etyl

As Atelaes has been asking for AF to convert templates to {{etyl}}, I've done some analysis and set up a list for AF.

current state of affairs: User:Robert Ullmann/t18
see the control file here: User:AutoFormat/Ety temps

AF will then be converting them over time, while also catching new uses by occasional contributors or those that haven't shifted to "etyl" yet. Comments anyone? Robert Ullmann 15:50, 7 September 2008 (UTC)[reply]

Am I correct in assuming that the conversion process will not convert the handful of templates that involve "languages" that do not have ISO 639 codes? (New Latin, Late Latin, Medieval Latin, Vulgar Latin are the ones I'm familiar with.) Or is there another way of handling these without losing information? DCDuring TALK 18:14, 7 September 2008 (UTC)[reply]

That's what his file says. Just the easy ones. --Bequw → ¢ • τ 19:24, 7 September 2008 (UTC)[reply]

Right. Late Latin can't be used in {etyl} as it stands now even if we invented a non-standard code, as it would link to w:Late Latin language. The language ages and groups and such we want to refer to in etys have to be done differently. (I have several different ideas. In the meantime, it would be good not to have {{G.}} for German, and {{Ger.}} for Germanic. The first goes to {etyl|de}, the latter to {proto|Germanic} or to a group (w:Germanic languages? Yup.), but we'd have to look at how it is being used. Possibly both ways ... Robert Ullmann 17:48, 8 September 2008 (UTC)[reply]

Template:Bolivia

I have just created the Regional template: Template:Bolivia. Although there seem to be the pages pito, pega, cachar using this template, they don't appear in the Category:Category:Bolivian Spanish. Probably I have overseen something, as I am no template expert. Could someone please help to fix this problem? Matthias Buchmeier 10:53, 8 September 2008 (UTC)[reply]

(can't edit this page any more)

Too big? Something broken? Anyway, won't load completely. If anyone can combine this section with above?

The Template:Bolivia problem is broken WM s/w: when #ifexists was changed to explicitly add links to the link table, we were told it was so pages could be updated if an entry was added. This, of course, has never actually worked. In the meantime, the ability of the s/w to update categories has been slowly deteriorating. (Note the section two above that one as well.) Only fix is to purge the relevant pages. If you look at Special:WhatLinksHere/Template:Bolivia you will see that they do appear in the links table. Robert Ullmann 16:44, 8 September 2008 (UTC)[reply]

I've archived March, and fixed some {{ … }} imbalances that were making MediaWiki be crazy and not show [edit] links for a whole bunch of segments; hopefully one or both of these changes will make you able to edit? —Ruakh_TALK 17:06, 8 September 2008 (UTC)[reply]

Yes, thank you. It would display the whole page, but I have to be able to section edit. And as you note, no links. (And nto even section numbers when I manually put them in the edit URL ;-) Robert Ullmann 17:42, 8 September 2008 (UTC)[reply]

Does that mean that it would be sufficient to purge only the template or that all relevant pages including the category and entry pages have to be purged? Do you think there is a chance that future MW versions will fix this problem? Matthias Buchmeier 09:57, 10 September 2008 (UTC)[reply]

Purging the pages. Editing the template should work, but almost always does not. Sure, they might fix it someday, but since they broke it essentially on purpose, I wouldn't hold your breath. Robert Ullmann 15:50, 11 September 2008 (UTC)[reply]

`{{quote-book}}` is great, but…

…it needs some work.

there is often a comma after the last item on the first line, this should not be
it would be great to have a parameter to add additional information (such as the w:Stephanus pagination number for a work of Plato)
maybe we should make a second version to be used in the Citations namespace, instead of requiring the strange indent2=*: parameter}}
maybe we should include the first indent as well, at least make this more consistent

Someone tech-savvy should have a look at this and then we should be promoting this template actively on WT:" and convert all the {{RQ-templates to use it as well. H. (talk) 15:16, 8 September 2008 (UTC)[reply]

Re: comma: Wiktionary:Quotations currently mandates that comma. However, discussion at its talk-page suggests that some editors would prefer a colon.

Re: additional information: OMG, abandon all hope. The template is great for typical cases, but I don't think it can cover stuff like this. OTOH, might it be worth creating a special template for quoting from works of Plato?

Re: indents: Another option is to use HTML-style markup (<dl><dt>…</dl></dt>) to achieve the indent, so that the template will work regardless of its context

—Ruakh_TALK 17:02, 8 September 2008 (UTC)[reply]

This is a wikipedia import that is already trying to do way too much. The quotation itself should be taken out of the template, and the wikisyntax (*: etc) as well; it should generate only the citation line, like the RQ templates. (And the specific RQ templates should be kept, they often have specific links that are useful.) Trying to "fix" it with crap like the "indent2=" parameter will just create more problems. Robert Ullmann 17:38, 8 September 2008 (UTC)[reply]

I use {{cite-book}} in the citations namespace (which is not the same as the clone of wikipedia's {{cite book}} which we also have clogging up the airways), it seems to perform adequately there. Conrad.Irwin 07:46, 12 September 2008 (UTC)[reply]

Then should I maybe put a warning on the template that it is not to be used? Or that certain parameters are deprecated (such as passage=)? I think indeed it could do good stuff if we confine it to the citation line only.

I hate that comma.

Note that it needs some additional parameters, such as volume= and chapter-title= (or, alternatively, chapter-number=). H. (talk) 09:27, 15 September 2008 (UTC)[reply]

preload templates broken

Seemingly, if someone tries to use a preload template to make a page that has an apostrophe in it, the apostrophe and everything after it get converted to a single ampersand. See Pickett&. Is there anything that can be done about this?—msh210℠ 21:10, 10 September 2008 (UTC)[reply]

It isn't the preload templates per se, it is that Mediawiki:Noexactmatch is using <inputbox> to load them, and it in turn is not generating the correct URL. Most cases seem to give a server error at present. Robert Ullmann 14:53, 11 September 2008 (UTC)[reply]

Now bugzilla:15564 on Extension:Infobox. Robert Ullmann 15:46, 11 September 2008 (UTC)[reply]

Appearance of Wiktionary using Google Chrome

This may not be the best place for this discussion.

Anyway - When you have multiple tabs open under Internet Explorer (IE7) it won't normally let you flip between tabs when you are waiting for a response from one of them (normally Wiktionary). Google Chrome does not have this restriction, but on my screen setup it looks awful. It looks like it has been written on an old typewriter. My font setups are as follows: IE7 (Webpage - Times New Roman, Plain text - Courier New) Page => Text Size => larger. Google Chrome (Serif - Times New Roman, 16 pt, San Serif - Courier New, 16 pt, Fixed Width - Courier New, 16 pt). Poor quality screensots are here => IE7 Google Chrome

Does anyone have any ideas how I could make Google Chrome look like IE7 (which is how I like it). SemperBlotto 09:47, 11 September 2008 (UTC)[reply]

I think you've borked something with the font settings. Out of the box, on WinXP/SP2, it looks just fine. Did you import settings from IE7? Note that you say you have set Chrome to "San Serif -- Courier 16 new"; that is not what you want, San serif should be a proportional font, e.g. Helvetica. Courier should only be for fixed width. Running text in the wikt, including the main page, is san-serif. So with the settings you describe, the results are as expected ... (;-) Robert Ullmann 14:46, 11 September 2008 (UTC)[reply]

Yes, the Chrome installation copied settings from IE7. I don't seem to have any Helvetica fonts on my machine. After several experements, Microsoft Sans Serif (18pt) gives the closest to my current setup - but I can't get the screen to anything like look the same. Perhaps I will have to wait for IE8. Or be a bit more patient when multi-tabbing. SemperBlotto 16:08, 11 September 2008 (UTC)[reply]

Okay... note that with the default settings (which I've been careful not to change) IE7 and Chrome are almost identical. The only difference is that Chrome does a bit better in spacing in some places (as does FireFox). You should be able to set up Chrome witht he same fonts etc as IE7, but that may not be complete yet (being a Beta). One thing I note is that it is slower than FF, and I can switch tabs in FF at times that I can't in Chrome (!). (I only use IE7 to test user presentation; which is also pretty much the only reason I use Windoze at all. ;-) Robert Ullmann 16:29, 11 September 2008 (UTC)[reply]

Well, now you mention it, Firefox also looks nasty on my setup - that's why I don't use it. I'm sure that my screen setup is different from most peoples (having bad eyesight) and, as I have been using the Internet since its early days, I have got used to it looking the way I like it. I can remember having to change a few things when I migrated from Netscape Navigator (under Windows 3.1) but I can't remember what I did. P.S. Google Chrome has a built in spellchecker for use when editing - you can set it to UK or US English - or even Italian. SemperBlotto 07:37, 12 September 2008 (UTC)[reply]

Setting the SanSerif font to Arial, 18pt will give you the best match to your current settings. 85.12.64.148 14:19, 12 September 2008 (UTC)[reply]

The ultimate dictionary

The following list is a suggestion of upgrades for Wikimedia :

1) Black wallpaper with colored fonts, for energy and eyes preservation.

Umm, no thanks! You can customise your Special:MyPage/monobook.css if you really want to inflict that upon yourself.

2) Crossword research, with some paronyms suggestions like in Google.

Could be interesting, maybe start a sample discussion on the Wiktionary:Information desk and see what happens, you can always split it off if it generates too much traffic.

Information desk answer = →The ultimate dictionary: remove crossposting (see Grease Pit)

3) Print link for all objects (arrays, pictures, schemes), with translations (eg: foreign meronyms).

Yikes, What are arrays and schemes? We don't host images directly, so that'd be quite tricky, you can get all images from http://commons.wikimedia.org/

4) Link towards articles to achieve and external links to include in the dictionary.

We have WT:RAE and the same for other languages, not sure what a list of external links to include would be for.

5) Photo, sound and video edition interface, with history.

Yes, this would be nice, we need someone with video experience for the sign-language stuff, and more people will microphones to record words.

6) An option for page reading, with several voices at a choice, and the possibility to read some pages with video plugins in addition (karaoke, colored fonts, films, video algorithms). A speech recognition compatibility would be enough ergonomic.

Umm, this is provided by operating systems for those interested, I doubt we could provide anywhere near high enough quality readings of all entries.

7) Download version for PDA and mobile phone with regular updates.

This has been proposed before, User:Hippietrail is probably the best to ask. If you are just looking for a light-weight wiktionary though, see http://ninjawords.com .

There's also Moulinwiki [1]

8) Addition of new models : false-friends, transparent words, and real friends or international paronyms.

This can be done (we don't have structured data, so models is probably the wrong word), just add relevant sections to words as you find they need them.

Sorry, model is a French false friend, actually I meant useful template.

9) Listing requests from definitions, eg: hyponyms of horse in 2 selected languages, all English + French + Spanish false friends, all etymologies of the words ending with the -logy suffix, ordered by alphabetic order, word occurrences in the selected language, or year of appearance in all languages. JackPotte 21:41, 13 September 2008 (UTC)[reply]

Umm... We'd need well-structured data to do stuff like that. Wiktionary is all in a plain text format (for better and for worse). Conrad.Irwin 22:16, 13 September 2008 (UTC)[reply]

JackPotte 20:58, 17 September 2008 (UTC)[reply]

10) Bar showing degrees of holonymy, meronymy, hypernymy et hyponymy. For instance, horse [2] would be the 2nd degree meronym of equidae [...] and the 14th degree of animal. JackPotte 22:23, 20 September 2008 (UTC)[reply]

MediaWiki:Blockedtext

This is missing several of the provided parameters, including in particular $6 which is the expiration time of the block. Random832 00:20, 14 September 2008 (UTC)[reply]

This is omitted on purpose. Most blocks here are vandals, and we don't wish to advertise to vandals how long they have to wait to vandalize again. --EncycloPetey 00:24, 14 September 2008 (UTC)[reply]

What are the other parameters? (And how do you know? I mean, where can you check what they are?)—msh210℠ 16:29, 15 September 2008 (UTC)[reply]

See [[mw:Manual:Interface/Blockedtext]]. (Not all pages in the MediaWiki: namespace are very well documented — or were last I checked — but many are.) —Ruakh_TALK 01:02, 16 September 2008 (UTC)[reply]

#

How can I create an article on the symbol #? It currently links to the main page. I have added info to octothorpe, where the symbol is discussed, but have been repeatedly reverted.

Thanks, Kwamikagami 02:13, 16 September 2008 (UTC)[reply]

As that character is not allowed for Wikimedia pages titles, we put information about that character (and others) in Appendix:Unsupported titles. You should be able to link to one of the subheadings there. --Bequw → ¢ • τ 05:29, 16 September 2008 (UTC)[reply]

Ah, I couldn't figure out how to link to it without getting an error. We can't link to subsections, though, correct?

Another possibility would be to use the Chinese double-wide characters, such as ＃. Would that not be appropriate? Kwamikagami 05:43, 16 September 2008 (UTC)[reply]

It would be a horrible hack; it might not display correctly in some contexts and would cause confusion for those trying to input the symbol (copy and paste would be "broken", typing wouldn't type it). The double-width characters are seperate code-points, and so it is best to treat them as distinct characters. (We now have to have a note on \ because someone in the deep-dark-distant past though it would be a good idea to overload a code-point). A better solution would be (in my opinion) to use U+0023 as the title, though that's fairly ugly too. Conrad.Irwin 07:40, 16 September 2008 (UTC)[reply]

Well, now that I understand how to link to it, it's no longer really an issue. Kwamikagami 08:41, 16 September 2008 (UTC)[reply]

I don't think [[Appendix:Unsupported titles]] is a very good long-term solution. That page is already fairly large, but it covers just the tiniest part of what we need it to. I think we should have separate appendices for all these characters, and then [[Appendix:Unsupported titles]] can link to those. —Ruakh_TALK 14:04, 16 September 2008 (UTC)[reply]

I added some hidden anchors, so for now at least you can link to, e.g., Appendix:Unsupported titles#octothorp.

In terms of written presentation, the headings on this page are not very useful to say the least (a space as the header text?). Can't we just pick a representative name for each symbol, for example “# (octothorp)”? —Michael Z. 2008-09-16 15:58 z

Is there some kind of theorem that says that some types of programming languages must have keywords that limit content? DCDuring TALK 16:21, 16 September 2008 (UTC)[reply]

First, this is a markup language, not a programming language. And no, there is no requirement that "content" be restricted. (For the answer for programmings languages, note that PL/I is an example of an ordinary language with no reserved keywords.) Look at HTML (also a markup language): the reserved characters (< > &) have escapes (< and of course & itself). The issue is that Wikitext doesn't provide a full escape mechanism for page titles (it does for content); the page titles are supposed to be encyclopedia entry titles; using them as exact spellings of dictionary entries is stretching it a bit. It does not allow # as a page title. (note that that also contains "#" ... and HTML doesn't provide a named entity for # because it isn't a syntax character in HTML except inside a numerical reference ...) Robert Ullmann 16:44, 16 September 2008 (UTC)[reply]

Thanks for the helpful reply to a poorly expressed question. Is there a prospect for there being such an escape mechanism for headwords? If not, could we have a work-around that allowed searches to work using the various reserved characters as they might be entered, with some kind of substitution table for the offending reserved characters? As it is now the special characters entered in the search box do not lead users to the appropriate appendix or, indeed, any pages at all. So that appendix is mostly useful to editors as a reminder of what characters must be excluded. How hard this is, where this fits in as a priority, and on whose plate it would fall are all questions beyond my paygrade.

As a byproduct of some approaches to addressing this, we might get the ability to search for wikitext that contains special characters for maintenance purposes, a particularly useful capability in those intervals between dumps. DCDuring TALK 17:25, 16 September 2008 (UTC)[reply]

XML dumps

The last XML dump that that "they" have managed to produce is 3 months old: 13 June 2008

I don't know the specifics for other people, but I myself have spent several hundred hours coding various compensations for the severely stale dump. In hindsight, I would have done far better to build a local mirror in July.

This is also an extreme breach of the purpose of the WikiMedia Foundation; it is not to produce a live encyclopaedia or dictionary website, it is to produce re-useable content under GFDL. The website(s) don't cut it, live mirrors are prohibited. The XML dumps are the GFDL publication of the content.

That aside, to continue: I think we will have to build our own (non-"live") mirror, keep it up to date, and produce compatible XML files. Any ordinary box with a gig of disk can start from the 13 June XML, catch up quickly (hours), and stay current with minimal network traffic; it can then spin a new XML dump every 24 hours and upload it somewhere. (Could do every 2-3 hours, but I don't think that is needed?)

(In case you are wondering about the update rate, my laptop running AF reads all the patrolled edits, with a very small fraction of the rather limited bandwidth and latency I get here; it could probably run this with no trouble ;-)

I don't knwo why Brion and Tim can't get this together, but after many, many complaints we should find an alternative. My time alone spent compensating is at least 20 times what it would take to fix the problem for WMF. (Another idea is for me to just rent a machine with a Tbyte or two from an ISP somewhere, and maintain XMLs for the whole project set ... maybe that would be too much load on the servers? Dunno. Would take less of my time than what I am doing now. I am not volunteering to do this!)

Anyone have other thoughts or ideas? Robert Ullmann 01:19, 17 September 2008 (UTC)[reply]

If relatively little effort or other resources are required, then it would seem that there must have been some kind of decision not to do the XML dumps, either so that they are not available to some specific class or classes of dump user or to free up resources for other very high-priority purposes. Could anyone be getting XML dumps by some other means? Could this be part of some kind of hardball negotiation process? Do we have any idea of what the WMF techs are working on? Is this a matter for airing on one of the WMF mailing lists? DCDuring TALK 02:00, 17 September 2008 (UTC)[reply]

(see for where it was left on 1 August) I have no idea what priorities they may or may not have; but (as I noted above) this or some other re-useable GFDL publication of the data is mission critical, and 3 months (for the en.wikt) or 45+ days (at least, for everything else) is completely beyond any acceptable operational problem resolution time. You can complain where-ever you want, AFAIK, it has been raised many times, by many people, it every forum they can locate or think of. My question is what we do about maintaining our project ... 02:13, 17 September 2008 (UTC)

Our own fundraising? I'm reluctant to recommend that those providing their time and skills also be hit up for funds. Would any of the folks using Wiktionary content commercially let their servers be used for the updating? I would think they might value it for any modest lead-time advantage (hours, days, certainly not weeks) they could get over others who use our content. Having the wear and tear be on someone else's server and communications lines would be desirable. Is there any reason that couldn't or shouldn't be done? DCDuring TALK 02:34, 17 September 2008 (UTC)[reply]

If anyone is inclined to contribute funds, they should send them to WMF, with a note asking that this be fixed. (;-) Finding the resources to do updates, upload, etc is not a problem (for en.wikt only). I am wondering if anyone can come up with other approaches? Robert Ullmann 18:18, 17 September 2008 (UTC)[reply]

195,337 pages have changed/been added since that dump - I can get a list of them using SQL on the toolserver; so we can get the dump up to reasonably-in-date. Once we have the dump within the limit of recent-changes (about a month), then we may be able to use a script by User:ArielGlenn which uses recent changes to update the dump in batches. All that remains now is to find a host - though I believe Amgine may have a spare server if we were to ask nicely. Conrad.Irwin 19:29, 20 September 2008 (UTC)[reply]

Ok, we have the beginnings of a plan. At the moment I am running User:ArielGlenn's script on User:Amgine's server. This will give us (in a couple of hours) all the information necessary to construct an up-to-date dump; though whether we can persuade the too dumps to merge themselves painlessly remains to be seen. Conrad.Irwin 00:46, 21 September 2008 (UTC)[reply]

I've been experimenting with creating a new dump. It can be done, and I have made two; however they are both wrong; the first has too many pages, as it contains all the pages that were deleted after the old dump and before the new. The second has too few, because I have a bug in the script that compared the xml dump with the list of current pages. The dump was generated something like this:

Get a list of changed pages "SELECT DISTINCT page, namespace FROM pages, revisions WHERE rev_page = page_id AND rev_timestamp > 2008062200000"
Cat it through some sed's to convert namespace into the prefix and some greps to remove Talk: space etc.
Run an awk script to "merge" the dumps, reads each page from each and then prints all the page-titles using the xml fragment with the highest timestamp
(Get a list of all pages)
(Run the dump and pagelist through a similar awk script that removes all pages not in the list - seems to delete too many pages, but i can't work out why)

Most of this stuff in very raw form can be found at http://devtionary.info/w/dump/updater/ (with too many pages is enwiktionary-pages-articles-20080921.xml , and with too few (I think) is enwiktionary-deleted.xml. Ideas and suggestions would be more than welcome, as once we have an accurate and reasonably current dump (which it might be easier to just generate using the all pages list, now I think about it again) the atglenn has a magic script which will get the list of changed pages from recentchanges and merge it into the dump. This is all a bit Heath Robinson at the moment, so some ideas would be useful. Conrad.Irwin 00:19, 26 September 2008 (UTC)[reply]

My thoughts were aomething like this:

set up a local db, key=pageid, record=(revid, content, the other attributes in the present XML format)
load the last XML dump (13 June) all-pages-current (not just NS:0)
for pageid from 1 to current, do API queries like this (with more than 10, perhaps a few hundred at a time, current is about 1.2 million, so perhaps 500 or so queries along the way)
if pageid is missing, make sure row is deleted from local db
if pageid not in db, or revid different from db (not higher, different, latest rev may have been deleted!) then get current content (in batches of 20-50, same sort of query with rvprop=content|etc)
at end, write the XML (can do both NS:0 and all, or the NS:0+(Template, Wiktionary, Help) of the existing pages-articles) in pageid order, so just as the existing one

after the first run (the 197K pages), this should only take 30 minutes or so (depending on lots of things ;-) So run every 24 hours. I would just do this here, but my net connection has a lot of latency and shared bandwidth, so it would take a while, and then the dumps would be on the wrong end of the same link, so would have to upload somewhere anyway. It might track RC, but might be just as good to simply read all of the present revids for each pass. Robert Ullmann 09:35, 26 September 2008 (UTC)[reply]

Question: do we have to/ahould we duplicate the WMF download format exactly, or is something compatible with the python dumpparse module good enough? (I suppose the site header can just be copied, but there are other details like including the edit restrictions on page and such.) Should we make an NS:0 only dump? Do we want any more than that? Robert Ullmann 10:08, 26 September 2008 (UTC)[reply]

I've written an XML dump updater that reads a dump, adds/updates/removes entries to get (closer) to current. With nearly 200K updates from the last dump, it would/will take quite a while. If we can get up to date, we can then stay there, producing a new dump every day. Do we have something close, in pageid order?Robert Ullmann 15:46, 28 September 2008 (UTC)[reply]

Thanks so much for doing this. I would have thought that most of the value comes from NS:0, at least if we will ever regular dumps again on the every-two-week basis. Is there anything about doing only NS:0 that would make it difficult to go back and do the rest later, should it prove necessary? Is there any risk of two processes (NS:0 and all other) getting out of sync? DCDuring TALK 19:25, 28 September 2008 (UTC)[reply]

It is a one-filter in the update program, currently set to '0' and '10' (the namespace number is a string at that point). If it is changed, the update will automatically add and drop pages as needed. The data file is in page-ID order, regardless of namespace. (I.e. "namespace" is just a minor attribute of each entry, they aren't stored separately or such.) Robert Ullmann 13:09, 1 October 2008 (UTC)[reply]

OK: where should I run this? Which namespaces other than 0 and 10 (Template) might be be needed? The location needs to be able to sustain a reasonable number of downloads. What o you use them for? Code is ready. 23:27, 29 September 2008 (UTC)

As to the last question. I'm a big fan of your "Missing", "Not counted", and non-std. header analysis pages, and CM's similar pages, all NS:0, AFAICT. DCDuring TALK 00:23, 30 September 2008 (UTC)[reply]

Agreed. And it would be nice to update at least the first and third sections of Wiktionary:Statistics, so we have an idea of where we stand WRT recent work in various languages. --EncycloPetey 00:29, 30 September 2008 (UTC)[reply]

Conrad has set me up with an account on Amgine's server (devtionary.info), and I've retested and run the update program. Some notes:

the software updates a limited number (some thousands) of entries on each run, so it will take a while to catch up
in the mean-time, each successive dump improves on the present 13 June dump, picking up edits and new pages
the first update (20080930) includes about 5K page updates, and all of the templates to current
the dump includes namespace 0 and namespace 10 (template), this is what is needed to interpret/reuse/analyse content
the dump is not a snapshot; it is page versions over a period of time; at present that is over nearly 4 months, but that will be reduced to a few days
I will be adding code to pick up revision info from RC, so when the dump is near-current it will be very close to a snapshot
I'm not planning on archiving old versions; if you want to capture it once a week/month whatever please do
I plan to run it once a day

See http://devtionary.info/w/dump/xmlu/. Remember, right now this is just a small increment on the 13 June dump. Robert Ullmann 17:31, 30 September 2008 (UTC)[reply]

notes: there is some issue with characters not on plane 0; I'm looking for it; python on that system tests as a "wide" build, i.e. UCS4; but there is still some problem. Other issue is that I've added the RC support, but this causes a (very bad!) effect: when someone adds a bad entry ("Mr. X in the history department is a gay [redacted] and likes to [redacted]" :-) it gets picked up with celerity, but not deleted until cache expiry ... as I said, not good. Will fix this. Robert Ullmann 23:36, 1 October 2008 (UTC)[reply]

was a bug buried fairly deep (but this being Ubuntu Linux, one can look at everything ;-) shelve uses pickle with protocol=0, which mishandles wide unicode in this case. Fixed by using protocol=2 (better/faster anyway). Did not affect data. Program also reads the deletion log, but that doesn't show the ID that was deleted, only the title, so it doesn't work as well as I'd like. (can still be improved)

process has current updates, and has caught up to about 1/5 of the stuff from the last 4 months; new dump in a little while. Robert Ullmann 15:31, 2 October 2008 (UTC)[reply]

Many of us are looking forward to a usable dump and, especially, the many useful lists that such a dump enables. And, just as washing a car is thought to be one of the best ways to end a drought, the successful completion of your efforts and your strong mojo are likely to cause WMF dumps to get back on schedule. DCDuring TALK 15:58, 2 October 2008 (UTC)[reply]

IIRC, the WMF intended dump schedule isn't as good as RU's, anyway. ;-) —Ruakh_TALK 16:11, 2 October 2008 (UTC)[reply]

They are supposedly installing the new servers today, and may start running dumps soon. But there is no indication that they will fix the queueing problems that often see the 90-minute en.wikt dump we want stuck behind a the 6-week long en.wp all-history dump. If they do manage to produce new dumps every few weeks, fine; we can then easily continue to spin dailies. (2 Oct available in an hour or two.) Robert Ullmann 16:21, 2 October 2008 (UTC)[reply]

credit where due while I have spent a dozen pleasant hours writing and tweaking code, credit must go to Amgine for providing a platform and Conrad for setting up access etc. Robert Ullmann 00:12, 3 October 2008 (UTC)[reply]

Is there a reason that the dumps produced don't have a root node? They are lists of 'page' nodes rather than having a root like 'mediawiki'. It's not much of a problem to fix them for analysis, I was just curious. --Bequw → ¢ • τ 10:05, 28 November 2008 (UTC)[reply]

It is just simpler, not to have to copy and update the "header" on the file, given that all the s/w I know of that reads it is just looking for a set of pages. (The Readme.txt says this) If you need it I can add something. (And then should properly add the close tag at the very end, that isn't there either. ;-) You have s/w that uses it? Robert Ullmann 10:28, 28 November 2008 (UTC)[reply]

I use AWB and pywikipedia which both aren't smart enough to deal with an XML w/o a root. Only worry if there are others that would like to use these tools, I'm fine. --Bequw → ¢ • τ 07:27, 30 November 2008 (UTC)[reply]

(pywikipedia xmlreader.py works just fine for me?) Anyway, I've fixed this to generate the "header" and the close tag at the very end; if you get the 1 December dump (and following of course) it should be fine. Tell me if it isn't right. Robert Ullmann 15:43, 1 December 2008 (UTC)[reply]

Devtionary use for live mirroring?

I have no problem providing bandwidth and platform for Devtionary, and was wondering if it might be useful/possible to set it up to livemirror en.wiktionary?

template:temp and lang=

Would it be possible to adjust {{temp}} so that when it receives a "lang=" parameter it displays it as text rather than interpreting it as a parameter? e.g. {{temp|legal|lang=fr}} would display as {{legal|lang=fr}} rather than as {{legal}}. Thryduulf 16:06, 17 September 2008 (UTC)[reply]

You do know the general way to do this? Write {{temp|legal|2=lang=fr}}. In general, use explicit n= if the parameter contains an "=". Works with all templates. (And it is good to remember that the script templates typically generate strings containing =, so if you use one within a numbered parameter in another template, you will need this.) Robert Ullmann 18:11, 17 September 2008 (UTC)[reply]

That's how I do it, but it's not perfect, since you if you do it for one parameter, you have to either do it for all the parameters after it, or precede it with a dummy empty parameter that gets overridden. By which I mean, {{temp|foo|2=bar=baz|bip}} produces {{foo|bar=baz|bip}}: the bar=baz gets overridden by bip by virtue of the latter being the actual second unnamed parameter, and appearing after it; so one has to put either {{temp|foo|2=bar=baz|3=bip}} (which produces {{foo|bar=baz|bip}}) or {{temp|foo||2=bar=baz|bip}} (which produces {{foo||bar=baz|bip}}). The whole thing almost defeats the point of using unnamed parameters (which is why I mostly avoid them in templates I create). Another approach that I've seen someone use — either Conrad.Irwin or Mzajac, I think — and which people might find more intuitive is to use <nowiki>=</nowiki>, e.g. {{temp|foo|bar<nowiki>=</nowiki>baz|bip}} (which produces {{foo|bar=baz|bip}}). Similarly, one can use a numeric character reference, as in {{temp|foo|bar=baz|bip}} (which produces {{foo|bar=baz|bip}}). Unfortunately, neither of these approaches works for the situation you describe with the script templates. —Ruakh_TALK 01:55, 18 September 2008 (UTC)[reply]

merging Template:see (also) and Template:xsee (aka xalso)

I'm putting together a template that will do both, so we don't need two variants (sometimes in the same entry). Presently at User:Robert Ullmann/t.

My intent is to make this the new {{also}}, and then continue converting {see} to {also}, and also convert {xsee} to {also}. I still need to check all the uses of {xsee} to make sure it is understood. Robert Ullmann 18:15, 17 September 2008 (UTC)[reply]

Decided this was a bad idea. Technically not a problem, but there are only a small number (couple of hundred) pages that use {xsee}, and 10's of thousands that use {see}/{also}. So not worth the overhead. (For now, if we could get a wlink: parser function that would be different.) Robert Ullmann 16:22, 8 October 2008 (UTC)[reply]

Navigation issues

The navigation box doesn't seem to be working- Mainpage-url doesn't go anywhere, nor does Discussionrooms-url. Teh Rote 16:15, 18 September 2008 (UTC)[reply]

Could this be related to some odd changes in RC? I noticed today that (1) The "Main page" link there no longer works; the link says "Mainpage-text" and points to [ttp://en.wiktionary.org/wiki/INVALID-TITLE], which it didn't do yesterday. (This problem seems common to every page) (2) There is no longer a runing count of the number of entries on Wiktionary in the header text. --EncycloPetey 19:25, 18 September 2008 (UTC)[reply]

Amgine has been doing something with the sidebar text; I'm not sure what. Robert Ullmann 19:30, 18 September 2008 (UTC)[reply]

Amgine is on in the IRC. I'll ask. --EncycloPetey 19:32, 18 September 2008 (UTC)[reply]

Neither Amgine nor Brion kows what is causing the problem. --EncycloPetey 19:33, 18 September 2008 (UTC)[reply]

the problem is the "null" edits Amgine made to the URLs. Adding a blank line above the expected line is not a "null edit"! Rolled those back, so the links work. Now we need to fix the text. Robert Ullmann 19:35, 18 September 2008 (UTC)[reply]

Don't know why MediaWiki:Mainpage-text isn't getting picked up again? So we are getting "Mainpage-text" instead of "Main Page"? Robert Ullmann 19:41, 18 September 2008 (UTC)[reply]

I deleted the sidebar/en copy that Amgine had added; but that isn't fixing it; it all looks right now (the configuration, not the result!). Caching? Does everyone else still see "Mainpage-text" and "Discussionrooms" (with no space)? Or is it fixing itself? Robert Ullmann 19:45, 18 September 2008 (UTC)[reply]

Still looks wrong for me, but that may be related to caching, as you say. --EncycloPetey 19:47, 18 September 2008 (UTC)[reply]

Wrong for me as well. I'm annoying #-tech with trying to get it fixed, and they have been doing something which they are not confiding to a mere plebe such as myself. - Amgine/^talk 19:56, 18 September 2008 (UTC)[reply]

Ah, purging those messages themselves seems to have worked nicely ;-) Robert Ullmann 19:58, 18 September 2008 (UTC)[reply]

<grumbles about calling the cavalry> Thanks Robert. - Amgine/^talk 20:01, 18 September 2008 (UTC)[reply]

Recent Changes

Up until today, the current page count displayed in Recent Changes. It no longer does. --EncycloPetey 20:11, 18 September 2008 (UTC)[reply]

Should be fixed now. Robert Ullmann 23:47, 18 September 2008 (UTC)[reply]

It is. Thanks. --EncycloPetey 00:21, 19 September 2008 (UTC)[reply]

substantialis

This page was obviously deleted by someone, but there are no entries in the deletion log- anyone know what's causing this? Nadando 20:39, 18 September 2008 (UTC)[reply]

I deleted it. It read "I need help". SemperBlotto 21:24, 18 September 2008 (UTC)[reply]
Do you recall when? It should appear in the Delete logs regardless. --EncycloPetey 21:25, 18 September 2008 (UTC)[reply]
This morning. About 8 o'clock UK time (07:00-ish). SemperBlotto 21:34, 18 September 2008 (UTC)[reply]

It does not appear in the delete log. Something odd going on. Robert Ullmann 05:29, 21 September 2008 (UTC)[reply]

MediaWiki:Noarticletext

It used to be that when I'd visit a redlink, I'd see the contents of [[MediaWiki:Noarticletext]]. How come that's no longer the case? It still seems to work for Wikipedia.

Also, unrelatedly except that I discovered it while trying to look through Wikipedia's MediaWiki pages, how come [[Special:PrefixIndex]] copies from to prefix when the latter isn't specified, rather than assuming an empty string? That's a bug, right?

—Ruakh_TALK 17:13, 20 September 2008 (UTC)[reply]

I dunno, I tried &action=purge on the page and it seems to be back now. Probably just an issue with the message cache. Conrad.Irwin 17:57, 20 September 2008 (UTC)[reply]

Cool, thanks. MediaWiki seems to have gone downhill in this regard. —Ruakh_TALK 19:16, 20 September 2008 (UTC)[reply]

Indeed, someone apparently tried to combine PrefixIndex and AllPages in some way about a month ago, and succeeded only in severely breaking both; despite repeated requests they have not been restored. The people "working" on the s/w seem to be spending their time making things "neater", or adding "features", breaking things as they go; fixing the 1000's of outstanding bugs is not a priority. With volunteers doing things, this will tend to happen, it takes project direction to control it. Robert Ullmann 05:21, 21 September 2008 (UTC)[reply]

Extension:Nuke

Fyi: There has been a decision at m:Metapub (link to old version) to add mw:Extension:Nuke to all WMF wikis that don't opt out of it. I know nothing more (not, e.g., when it will be implemented, or how the opt-out procedure will work).—msh210℠ 19:06, 24 September 2008 (UTC)[reply]

I would hope that there would be an option to mass restore as well as delete if this were implemented. Nadando 21:52, 24 September 2008 (UTC)[reply]

There is no "mass restore". I'd say quite strongly that we want to opt-out; we have not had any serious need for something like this; if we did we have the technical ability to script it when needed (e.g. I essentially have the code; others can do it). And there is no need for a steward to use it to clean up here, we take care of ourselves. (I don't suppose I need point out the downsides of WF aquiring this?) Why it is not restricted to stewards and 'crats is beyond my comprehension. Robert Ullmann 06:32, 25 September 2008 (UTC)[reply]

If there is no mass restore, and insufficient restriction as Robert has noted, then I agree solidly with him that we should opt out. We don't have a need for this, and there is too much potential for harm. --EncycloPetey 06:35, 25 September 2008 (UTC)[reply]

I, too, feel a bit of apprehension at such a thing. I see no benefit, and a decent amount of risk involved. Please, let's opt out. -Atelaes λάλει ἐμοί 06:38, 25 September 2008 (UTC)[reply]

I'm not involved in such technical aspects of working on this project, but I am sympathetic to the concerns raised by other, more experienced editors (Robert Ullmann, EncycloPetey, Atelaes). __meco 08:03, 25 September 2008 (UTC)[reply]

I'm with meco: opt out!—msh210℠ 17:45, 25 September 2008 (UTC)[reply]

Should we start a !vote to opt out, or should we wait until we know how the opt-out procedure will work? —Ruakh_TALK 11:02, 25 September 2008 (UTC)[reply]

The question has been asked on Metapub by ru.wp and by me. There doesn't seem to be a list or such yet. Robert Ullmann 11:14, 25 September 2008 (UTC)[reply]

Is there an implementation date scheduled? DCDuring TALK 11:18, 25 September 2008 (UTC)[reply]

Now there is an opt-out list. We are listed on it, but as unconfirmed (it says "link?", apparently seeking a link to where we voted to opt out). I haven't time to draft a vote page for this.—msh210℠ 16:47, 7 October 2008 (UTC)[reply]

RC - current nummber of entries

The count seems frozen right now. Normally, it climbs as new "good" articles are added, but it's not changing right now. Is this related to the slow response and numerous server errors today, or is it perhaps a separate problem? --EncycloPetey 21:40, 24 September 2008 (UTC)[reply]

Addendum: On the IRC, Nadando indicated that this was a deliberate disabling while MW fixes happen. --EncycloPetey 06:36, 25 September 2008 (UTC)[reply]

If anyone is interested in some of the gory detail, see this. One thing to note is that when the counter runs again, it will have missed the new entries in the meantime until an update is run. So don't expect it to jump immediately to the correct value when it starts moving. Robert Ullmann 11:03, 25 September 2008 (UTC)[reply]

Seems to be working again, although the count is still off (how often are updates run?) Nadando 07:10, 28 September 2008 (UTC)[reply]

The update is a one-off run, done when something has caused to counter to be in error. Don't know when they will run it. The problem may not be fixed yet. Robert Ullmann 15:39, 28 September 2008 (UTC)[reply]

`{{topic cat parents/Persian derivations}}`

Hello, User:CyberSkull has created this template and edited Category:Persian derivations with some strange results, for example at Category:arc:Persian derivations. I would really appreciate it someone could explain to me the purpose of it and the new format or structure for the etymology categories, and if this could be fixed, thanks. Pistachio 12:28, 25 September 2008 (UTC)[reply]

I have corrected the template to match other Derivations categories. --EncycloPetey 04:24, 26 September 2008 (UTC)[reply]

Thanks :-D How could I edit the new wording 'The following is a list of Aramaic words related to etymology of the Persian-derived words', from Category:arc:Persian derivations and stop it from adding words into Category:arc:Persian language and Category:arc:Iran? Pistachio 07:37, 26 September 2008 (UTC)[reply]

I'm not sure about the answer to the first question. The documentation for the topic cat setup has never been properly documented, so I usually have to dig around each time something like this happens. I expect that wherever the text is located, the category problem can be fixed from that location as well. --EncycloPetey 07:41, 26 September 2008 (UTC)[reply]

The topic cat templates were never really "officially" changed, although it seems like there has been a pretty substantial shift toward them. If there are specific areas that the documentation is lacking or if you have ideas for some sort of unifying document, I'd be willing to work on it. I think that Template talk:topic cat was the closest I ever came to documentation, but one thing I never did was do a walkthrough or use case-type document. Mike Dillon 04:30, 1 October 2008 (UTC)[reply]

Smoothing out page titles

Please create MediaWiki:Pagetitle-view-mainpage with the text "Wiktionary, the free dictionary". This will cause the title bar on Wiktionary:Main Page to read simply "Wiktionary, the free dictionary" instead of "Wiktionary:Main Page - Wiktionary". The title bar on all other pages will remain the same. —Remember the dot ^(talk) 22:47, 26 September 2008 (UTC)[reply]

That's pretty cute, now all we need to do is stop this "page-contents is stuff that is spelt like the title" idiocy that's so omnipresent :). Conrad.Irwin 22:54, 26 September 2008 (UTC)[reply]

Thanks for creating the page for me! What do you mean by "'page-contents is stuff that is spelt like the title' idiocy"? —Remember the dot ^(talk) 23:02, 26 September 2008 (UTC)[reply]

WElll... It's an irritation here that color is a seperate page from colour, that a has so many different sections that are not really related, etc. etc. It'd be nice to do something better, but I've yet to come up with a grand super-plan that didn't have a major flaw. Conrad.Irwin 23:04, 26 September 2008 (UTC)[reply]

Ah, I can definitely see how that would be a point of conflict. I hope you guys get it all worked out.

By the way, have you considered using Firefox? It has spell-check built in, so it'd help you avoid common spelling mistakes like "seperate". —Remember the dot ^(talk) 23:10, 26 September 2008 (UTC)[reply]

Spell-checks are only useful if you type common words in a single language. Wiktionary is multilingual and uses extensive coding. --EncycloPetey 23:14, 26 September 2008 (UTC)[reply]

True. Spell-check is more useful when leaving comments on discussion pages than when writing dictionary entries. —Remember the dot ^(talk) 23:29, 26 September 2008 (UTC)[reply]