Cleanup pages for pages that use 'term' without a language

Cleanup pages for pages that use 'term' without a language

Hi! Do you think you could add something to your bot so that entries that use {{term}} without specifying a language are added to a cleanup list? {{term}} doesn't actually default to English like many people might expect, but instead it defaults to 'no language'. It's not strictly an error, but from a usability point of view it's less than optimal, and it would be better I think if most of those cases could be fixed.

I already tried to make a list using categories a few days ago, but there ended up being so many (well over a hundred thousand!) so it seemed hopeless. But I think maybe your bot would be better suited because it can subdivide the entries and make them more manageable. Of course, we can't actually say what language {{term|water}} is supposed to link to... so I think for now the best way to subdivide the entries would be based on what language section the template appears in. So, if a German section contains {{term|water}}, add it to the subpage for German entries. (Wiktionary:Todo/Temp without language/German?) It would probably be a good idea to make further subpages by letter as well, so if Wasser contains {{term|water}} it would be listed on Wiktionary:Todo/Temp without language/German/w.

What do you think of this idea? Is it doable?

CodeCat12:57, 7 June 2012

It's pretty simple to construct using regular expressions. I used =German==([^-]|-[^-])+\{\{term(\|(?!lang=)[^\|\}]+)+\}\} to construct one for German just using AutoWikiBrowser. (Did you want "term" in the title instead of "temp" or were you planning on expanding this to other templates that should have language parameters). It's probably better for someone to just run this regexp for languages for which there's cleanup interest (non-English would be the most bite-sized). Once the backlog has been whittled down then it can be easily be turned into a periodic, maintenance cleanup-list.

Bequw τ12:48, 16 June 2012

Oh yes that was a mistake, sorry. I meant 'term'. I can't use AWB because it's only for Windows, but I could probably do the same with Python. I don't really know how the XML dumps work though, could you explain it a bit maybe?

CodeCat13:39, 16 June 2012

mw:Manual:Pywikipediabot has a module xmlreader that you can use to parse the xml dump into pages. Other tools are mentioned at m:Data dumps.

Bequw τ13:52, 16 June 2012