User talk:Ruakh/2013

Active discussions

January 2013Edit


What does the ISBN code mean in this edit? Pass a Method (talk) 20:48, 7 January 2013 (UTC)

See w:ISBN. —Μετάknowledgediscuss/deeds 21:11, 7 January 2013 (UTC)

yákhas ót l'rá'ashEdit

Are you sure about the transliteration? Laráash (לָרַעַשׁ) sounds better to my ears than l'ráash, but I may be overly influenced by my greater familiarity with Biblical Hebrew than with modern. (Cf. Gesenius, 102h.)​—msh210 (talk) 06:57, 9 January 2013 (UTC)

I was confident enough to put it in the entry, but no, I'm not sure enough to call myself "sure about" it. —Ruakh 06:03, 10 January 2013 (UTC)
Good enough for me. I didn't know whether you had put that transliteration in advisedly.​—msh210 (talk) 06:47, 10 January 2013 (UTC)

Template:list:Hebrew script letters/heEdit

Hi, Ruakh. I'm trying to convert templates to the new format used in User:CodeCat/list helper which is less resource-intensive. But with this template I am having some problems, because my browser is acting strange with the Hebrew characters. When I press enter it changes the order of the characters, and I don't know the Hebrew alphabet so I am afraid to mess up and put the letters in the wrong order by accident. You probably have more experience dealing with such things so could you try? Template:list:Latin script letters/en has an example you can work from, but Hebrew doesn't use letter casing so there would probably be only one letter per line. —CodeCat 02:20, 11 January 2013 (UTC)

I gave it a whirl. I'm not really sure how it should look; I kept the letters in the left-to-right order they already had (which is kind of backward-looking, since Hebrew is read right-to-left, but not a huge deal), except that since the Latin uppercase letters were unseparated from their lowercase counterparts, I did the same thing for Hebrew medial and final forms, so for those I had to put them in right-to-left order, because ךכ looks like an April Fool's prank. Feel free to reverse the overall order, or make any other tweaks, or anything. —Ruakh 02:40, 11 January 2013 (UTC)
Thank you! I will trust your judgement when it comes to Hebrew because I know nothing about it so I can't judge how it should look or what looks good or bad. If you think the order should be reversed that is ok, but keep in mind that English users will expect the order of list elements to be ordered left to right, even if the individual items are to be read right to left. Template:list:days of the week/yi was made that way too. —CodeCat 02:46, 11 January 2013 (UTC)
Personally, I think it looks weird and would prefer right-to-left. days of the week/yi is acceptable to me, but that's because English-speaking users might be going to those pages for their semantic value, in which case left-to-right order is most logical. This is going to be chiefly used for the letters' value in writing, not semantic meanings, and users who are looking at the pages are more likely to know that Hebrew is right-to-left. It would then reflect the universal presentation of the script letters, rather than our idiosyncratic mixture of directions, as if someone was taking xkcd seriously. —Μετάknowledgediscuss/deeds 03:06, 11 January 2013 (UTC)
Note that the Arabic alphabet lists (Template:list:Arabic script letters/ar, Template:list:Arabic script letters/fa, etc.) are currently right-to-left. --WikiTiki89 19:23, 11 January 2013 (UTC)
FWIW, I would support putting the Hebrew letters of the alphabet in, erm, alphabetical order, right-to-left. I find it rather surreal that they're listed left-to-right. - -sche (discuss) 04:41, 13 January 2013 (UTC)
  DoneΜετάknowledgediscuss/deeds 05:35, 13 January 2013 (UTC)
I've reverted and re-done it a different way, I hope you don't mind. (Putting the letters in reverse order, while forcing that order to be presented LTR, seems rather hackish to me. Logically, it makes more sense to put the letters in the correct order, presenting it RTL. Please revert if there was a reason for the other approach.) —Ruakh 06:23, 13 January 2013 (UTC)
I think HTML already treats Hebrew text as RTL by default, so the RTL markers probably aren't necessary. —CodeCat 13:05, 13 January 2013 (UTC)
You are correct, but I included them because I think that maybe our templates should include LTR markers around Hebrew-script text, so I'm operating under the premise that maybe we'll do that someday. If so, then in the rare cases that we really want the context containing Hebrew-script text to be RTL, we would have to explicitly include RTL markers; and in the meantime, they're harmless; so it seemed like a sort of future-proofing. —Ruakh 16:03, 13 January 2013 (UTC)
Hackish? Yeah, kinda. I honestly just did it because it was easier that way (somewhat embarrassingly, while singing the call-and-response אלפבית song...) —Μετάknowledgediscuss/deeds 16:50, 13 January 2013 (UTC)


I don't speak IPA, but something tells me 接管 is not pronounced /ʨk/. ---> Tooironic (talk) 04:23, 13 January 2013 (UTC)

I agree. —Ruakh 04:28, 13 January 2013 (UTC)
  DoneΜετάknowledgediscuss/deeds 05:26, 13 January 2013 (UTC)
Well, you fixed that one entry — and thank you :-) — but that doesn't really solve the overall problem . . . —Ruakh 06:17, 13 January 2013 (UTC)
Tooironic only complained about one entry. As you know better than I, you can certainly scan a database dump for Mandarin IPA significantly shorter than the pīnyīn values, which should bring up all examples of this bug, and I'm sure that between me and the Mandarin regulars we can fix them all by hand. —Μετάknowledgediscuss/deeds 16:54, 13 January 2013 (UTC)
Wait, really? I guess it never occurred to me that y'all would be willing to do that. The list of all 183 problematic entries is at User:Metaknowledge/py-to-ipa-problems. Thank you! :-D   —Ruakh 17:07, 13 January 2013 (UTC)
Сумпор! Wait, wrong language. Uh, um, 了不起!(or something like that, not sure if I got it right...) While we're at it, can you explain why this bug even happened? —Μετάknowledgediscuss/deeds 17:11, 13 January 2013 (UTC)
I really don't know. Either the template was broken to begin with and no one even noticed, or something broke in the template during the process of making it substitutable. (The latter case has two subcases: the breakage could have been substitution-specific — like, say, maybe an #if: was made safesubstitutable, but its condition contained a nonsafesubstituted template, in which case the order of evaluation would have been such that the #if: would misbehave — or the breakage could have been general, like, some part of the template accidentally got deleted during that process. In either of those subcases, it's most likely, but not certainly, my fault.) In either case, the problem wasn't noticed until well after it had been substituted everywhere. I tried to go back later and figure out what had happened, but I couldn't: the template was just too messy and indecipherable (and it didn't seem like a high priority, since figuring out the problem would not really help in reversing the problem). —Ruakh 17:18, 13 January 2013 (UTC)
  • Re: "Who knows how many IPA transcriptions it has stuffed up": I don't think anyone knows for sure, but it's presumably either zero (if they were already messed up) or 183 (if they weren't).   Re: "Is anyone going to address this?": Well, mostly I'd been quietly ignoring the issue out of a sense of frustration over the whole thing. (The substitution problems were just the last straw in the whole mess of dealing with this template.) But Metaknowledge (talkcontribs) has now offered his and your assistance in fixing all of them. :-P   If that doesn't work out . . . I'm not capable of fixing these broken pronunciations, but I can certainly go through and remove them, if people want. —Ruakh 03:33, 14 January 2013 (UTC)
If you are not capable of fixing the mess afterwards, then don't mess with it in the first place. Substituting the unchanged template with related string templates missing will of course generate erroneous pronunciations. To fix all this, restore the related string templates (even if temporarily), replace all {{IPA|...|lang=cmn}} with {{subst:py-to-ipa|... (parameters from {{cmn-...|pin=***}}) }}. 03:43, 14 January 2013 (UTC)
The related string templates weren't missing at the time. But, uh, nice try. Better luck next time? —Ruakh 03:49, 14 January 2013 (UTC)
I don't know how many related templates you misdeleted whilst doing these substitutions and I don't care. Your mess anyway. 03:56, 14 January 2013 (UTC)
I didn't delete any of them. So: zero. Oh, but that's right, you just said that you don't care. So you're just trolling. Which is convenient for me, because I'm sick of replying to you, and now I don't have to: trolling is grounds for blocking, so the next time you comment here, I can just revert & block. Problem solved. :-)   —Ruakh 04:02, 14 January 2013 (UTC)
@Tooironic: I really intend to do it, just not today. Wanna help? —Μετάknowledgediscuss/deeds 04:04, 14 January 2013 (UTC)
Sorry to say this but before this is done en masse, the IPA on 接管 is incorrect, cf. the zh.wikt page. Tone sandhi has not been taken into account because whoever was generating this pronunciation was relying on User:Wjcd/py-ipa, a tool that only generates IPA for monosyllabic pinyin. 04:47, 14 January 2013 (UTC)
Er, I can't understand the numerical notation they use... the only tone sandhi rules I know are bordering 3rds make the first one(s) 2nd(s), 3rds followed by other tones generally don't come back up, and 一 and 不 are exceptions. What am I missing? —Μετάknowledgediscuss/deeds 04:58, 14 January 2013 (UTC)
The actual picture if it is to be represented by IPA is more complex than those two rules. There are basically six rules, and these rules make the third tone non-existent in compounds. The four tones in Beijing Mandarin are value-wise 55, 35, 214, 51 (Superscript numbers 1-5 are equivalent to IPA tone letters ˩˨˧˦˥). When they combine,
1) 55/35/51 + 214 = 211 + 214;
2) 214 + 214 = 35 + 214;
3) 214 + ø = 21(4);
4) non-ø + 211/214 + non-ø = non-ø + 1 + non-ø;
5) 51 + 51 = 53 + 51;
6) tone sandhi of 一 and 不.
Apply these rules repeatedly, until a stable tonal profile is obtained, where no sandhi rule from above can be applied any more. This gives the final IPA pronunciation. 05:13, 14 January 2013 (UTC)
Wow, thank you! I feel enlightened (and somewhat miseducated). I'll memorize this method straightaway (taking notes for the time being). Anything else that I ought to know but probably don't from using py-to-ipa and reading online guides written by non-linguists? —Μετάknowledgediscuss/deeds 05:54, 14 January 2013 (UTC)
  • Re above, I would help but I know nothing about IPA. Nor am I willing to learn. So many other Mandarin-related tasks to be done. Good luck! ---> Tooironic (talk) 03:50, 16 January 2013 (UTC)
    Oh, one small thing. I noticed that you've tagged these pronunciations as "Beijing". Are they really? I assume they're just Standard Mandarin.... ---> Tooironic (talk) 03:52, 16 January 2013 (UTC)
    Did I do that? I intended Putonghua. —Μετάknowledgediscuss/deeds 03:54, 16 January 2013 (UTC)
    Actually, that's how it was created. I don't know enough to challenge that. —Μετάknowledgediscuss/deeds 03:56, 16 January 2013 (UTC)

Tbot supportEdit


There are some questions/requests in User talk:Ruakh/Tbot.js#Testing_and_making_it_work_with_other_languages you may have missed. --Anatoli (обсудить/вклад) 03:29, 14 January 2013 (UTC)

Thanks. I saw the comments, but I didn't know quite how to reply . . . I'll try. —Ruakh 03:51, 14 January 2013 (UTC)

Aramaic prefixes and suffixesEdit

I noticed an issue with our Aramaic prefix and suffix entries. It seems that whoever added them put the hyphen on the wrong side (I presume to circumvent bidirectionality issues), in addition to the fact that for the Hebrew script it should be a makaf. I corrected several of the Hebrew script ones and even noticed that you yourself fixed to ל־ a while back. The problem is I don't know how many more of them there are and I haven't even gotten to fixing the Syriac script ones. I was wondering if you could generate a list of entries whose titles match "-.*|.*-" and whose body contains an Aramaic L2. If there are too many of them, maybe you could even use your bot to fix them?

I would do it myself but I can't get Python to stop throwing Unicode errors as it reads the dump (which I have found to be caused by a broken XML reader library). Also, would you happen to know if the Syriac script has a version of the hyphen analogous to the Hebrew makaf?

Thanks. --WikiTiki89 03:26, 22 January 2013 (UTC)

As of the January 10th dump, we had four: ,‎ ,‎ ,‎ and -דיל. So presumably we now have none, though is broken now. (I didn't filter by script or anything, so apparently no Syriac-script entries had this problem.) As for whether Syriac has something makaf-like — I really have no idea. —Ruakh 05:46, 22 January 2013 (UTC)
Our resident Syriac experts are 334a (talkcontribs) (for the Classical language) and Rafy (talkcontribs) (for the modern language). —Μετάknowledgediscuss/deeds 05:48, 22 January 2013 (UTC)
Some or all of the entries in question were added by 334a, so I don't know how much he knows about Unicode. Then the problem might be a little more complicated because I encountered Syriac script links with this problem, I guess they must have been redlinks (I only looked at them within the wiki code so I wouldn't have noticed). And I assume it would be harder to find the problem in links than in entries.
Also isn't really broken, I redirected it to ־ו (the Hebrew 3rd person singular possessive suffix). When I realized the latter didn't exist, I was too lazy to create it. --WikiTiki89 06:01, 22 January 2013 (UTC)
I'm not sure what "broken" means to you, but to me, a redirect to a redlink is a broken redirect: "broken"! (I mean, I often create a redirect a few minutes before creating the entry it redirects to, but יש גבול.) —Ruakh 06:25, 22 January 2013 (UTC)
Fine, I created ־ו. --WikiTiki89 06:46, 22 January 2013 (UTC)
Thank you. :-)   —Ruakh 06:49, 22 January 2013 (UTC)
It turns out the Syriac-script entries with this problem use the L2 "Classical Syriac" rather than "Aramaic". Can you do another dump analysis, maybe for any L2 header containing the string "Syriac"? --WikiTiki89 18:48, 22 January 2013 (UTC)
As of last night's database dump, there were only two: and . —Ruakh 03:34, 23 January 2013 (UTC)
I guess it just shows how poor our coverage of Aramaic/Syriac is. --WikiTiki89 05:15, 23 January 2013 (UTC)

How to use Category:Hebrew personal pronouns?Edit

I noticed that you reverted (all?) my edits where I added Hebrew personal pronouns to Category:Hebrew personal pronouns. If you look at that category, it is now clearly missing many of Hebrew personal pronouns. Do you disagree that those are Hebrew personal pronouns, or do you find that category useless, or something else? --Thv (talk) 06:46, 25 January 2013 (UTC)

I think the category is fine; the problem isn't that you added entries to it, but that you removed entries from Category:Hebrew pronouns. The headword line should still be {{head|he|pronoun|…}}; Category:Hebrew personal pronouns should be added explicitly at the end of the language section. (Sorry; I had intended to either fix these myself afterward, or leave you a message about it, but then it slipped my mind. Thanks for asking about it.) —Ruakh 15:10, 25 January 2013 (UTC)

Page links dump and invalid page IDsEdit

There are around 5000 links in the pagelinks dump that have nonexistant page IDs (the ID is not present in the main dump). Do you know what the deal is with these? DTLHS (talk) 03:24, 26 January 2013 (UTC)

If a page-ID is in the range [1, 3846626], but is not present in the latest pages-articles.xml, then I assume that usually (always?) means it was assigned to a now-deleted page. If such a page-ID occurs in pagelinks.pl_from, then I imagine it's simply that the link-record failed to be deleted when the page was. (I don't know anything about the history. Honestly, I wouldn't have been shocked if MediaWiki had simply never deleted such records, but if you only found around 5000 such links, then that's apparently not the case.) Do you notice any obvious pattern in these IDs, like, do they mostly clump in narrow ranges, or anything like that? —Ruakh 03:52, 26 January 2013 (UTC)

On certain Latvian grammatical wordsEdit

Since you asked me a question about the Latvian red links in grammatical templates... The phrases vīriešu dzimte, sieviešu dzimte correspond to masculine and feminine respectively. Now dzimte is one of those 19th-century "neologisms", derived from dzimt "to be born", and means simply "(grammatical) gender". So in principle they should be linked as two words -- "vīriešu" and "dzimte", "masculine" and "gender", i.e., SoP, right? Or should I assume that "vīriešu dzimte" is a phrase simply because it is the "official" name of the masculine gender, it is official grammatical terminology? --Pereru (talk) 15:11, 26 January 2013 (UTC)

Wiktionarians have argued on this point since time immemorial, and the results of those arguments have been very inconsistent. I think the safest route is probably to link to each word separately, rather than to link to a two-word phrase that may or may not be idiomatic. —Ruakh 16:55, 26 January 2013 (UTC)

February 2013Edit

Linking and tabbed languagesEdit

Hello Ruakh --

I saw your recent link fix at ニゴロブナ, thanks for that. Your edit comment brought back to my mind an idea I've been toying with for a bit, that of creating a template for listing JA terms, similar to {{l|ja}} but specifying the lang for the two transliterations (kana and romaji).

For instance, the JA editors I'm aware of (myself, Haplology, I think Anatoli and James Jiao) have used wikicode formatting in lists like that seen at 御#Derived_terms:

* {{l|ja|御子|tr=[[みこ]], ''[[miko]]''}}: a shrine maiden

Your mention that this might screw up tabbed languages makes me wonder if this format is sufficient, but I don't use tabbed languages and don't really know. My idea was to leverage {{l|ja}} into something that might look like {{l-ja|御子|みこ|miko}} and be equivalent to {{l|ja|御子}} ({{l|ja|みこ}}, ''{{l|ja|sc=Latn|miko}}''), with all links properly pointing to the correct language.

Your thoughts? -- Eiríkr Útlendi │ Tala við mig 06:24, 6 February 2013 (UTC)

PS -- no worries about the other day; we all have days like that. o_O

I created {{ja-l}} a few months back for this purpose, doing exactly what you describe; but it doesn't seem to have had any uptake. Since then, quite a few more such language-specific {{l}} templates have been created, as subpages of {{l}}, so I guess we should create {{l/ja}}. (It can just redirect to {{ja-l}}. Or we can do it the other way, moving {{ja-l}} to {{l/ja}}.) —Ruakh 15:27, 6 February 2013 (UTC)
That template looks more complicated than just a simple linking template, though. Do you think it is a viable candidate for {{l/ja}}? —CodeCat 16:56, 6 February 2013 (UTC)
Yes. At least, I think that any {{l/ja}} would need to include all of this complexity. No? —Ruakh 02:38, 7 February 2013 (UTC)
I'd prefer it if the basic linking templates were kept as simple as possible. That doesn't mean a template that combines them can't be created, but I would imagine there are situations where the extra code isn't necessary. It's always possible to make a bigger template out of smaller ones, but the reverse is not true, so the common denominator should probably be kept low. —CodeCat 02:55, 7 February 2013 (UTC)
But I think that this is as simple as it gets for Japanese. Or at least, it's the simplest thing that could be called "l". (I mean, I'm not dogmatic about it. If you have an idea for how you think {{l/ja}} should look, I would certainly keep an open mind. But that's how it seems to me right now.) —Ruakh 05:49, 7 February 2013 (UTC)
I was thinking that it would only link to one word, so it would resemble {{l/sh/Cyrl}} but with another script. We often link to Serbo-Croatian words in pairs (both scripts, since both get an entry) but the linking templates don't support this directly since it is often desired not to link to a pair of words. I figured Japanese could work the same way, with the three (kanji, hiragana, romaji) or four (katakana too) representations of the word linked individually. {{ja-l}} already makes multiple links, so it can act as a convenient replacement for multiple instances of {{l/ja}} together. —CodeCat 14:21, 7 February 2013 (UTC)
I see what you're saying. I guess the thing is, that may be what {{l}} should be, but it's not what it is: it may have been intended, in part, as a single-link template, but what it is is an approximation to {{onym}}. (An inferior one, granted, but a very widely used one.) So I think {{l/ja}} needs to be what {{ja-onym}} would be, if {{ja-onym}} existed. —Ruakh 03:11, 8 February 2013 (UTC)
  • Hmm, and thinking it through further, this becomes a more complicated problem space -- some JA terms spelled in kanji have multiple readings, such as 魚釣り, which could be read as either うおつり uotsuri or さかなつり sakanatsuri. This is common enough that the template should ideally be able to handle probably up to three pairs of kana/romaji readings. The logic used at {{compound}} might be a useful reference. -- Eiríkr Útlendi │ Tala við mig 17:17, 6 February 2013 (UTC)
  • I think that with those, the best approach is to use the template multiple times. I mean, they're separate words in that case, no more related than English live (to be alive) /lɪv/ and live (in person or in real time) /laɪv/. —Ruakh 02:38, 7 February 2013 (UTC)
FWIW, I've never used {{ja-l}} simply because I didn't know it existed. (^^); -- Eiríkr Útlendi │ Tala við mig 17:19, 6 February 2013 (UTC)

I'm sorry... can we try to sort it out?Edit

I'm sorry for my outburst in the Grease Pit. Sometimes I get a bit overly emotionally attached to certain things because I have a strong idea of what is right or wrong. I was also a bit frustrated at the prospect of having yet another public discussion be derailed by the issue, when it's clearly just between us. I'd like to understand what the problem is in a calmer setting.

First let me explain what I think is your stance, so that we can get any misconceptions out of the way? As far as I'm aware, you prefer certain code templates to have prefixes so that there is a technical barrier for their usage, and only templates that have been explicitly coded around that barrier will accept such codes. I also think that you want those prefixes to be used so that Wiktionary users realise that they are not "normal" codes, and will hopefully act accordingly when using them. I remember you saying something like "different things should look different". Is that correct?

Now, with Lua, we don't actually have a need for the prefixes themselves, because there are other ways to implement "prefixes". For example, we could put them in separate modules, so that anyone who is using those codes will be explicitly aware that they are to be imported and used from a distinct location. So even if a technical barrier is desirable, there may be better and more "Lua-like" ways to do it than with prefixes embedded in the code string. My objection to these technical barriers in general is that while I do agree that different things should look different, regular codes don't actually work that differently from reconstructed codes. In fact, as far as I'm aware, they only differ when linking is concerned, since reconstructed languages are placed in a different location and use a different naming scheme. And implicitly that means that all entries need a sort key, but sort keys may be desirable for non-Appendix languages too (which Lua will be a tremendous help for, by the way!). All other uses are the same: expanding their names (like in the multitude of category boilerplate templates), categorising their entries (Category:Proto-Germanic language is no different from Category:English language), and so on.

So my objection is that while it may be good to make the difference explicit, it also makes it more difficult to handle all cases when there are no differences at all. A template like {{poscatboiler}} should not need to have special support for reconstructed codes, because it treats them exactly the same as regular codes. Similarly for {{head}}, which should just work fine for appendix languages, except for the links. When every template needs special support, it implicitly disables that template for reconstructed languages until someone takes the time to fix it. Which can be frustrating at times. —CodeCat 14:48, 15 February 2013 (UTC)

I accept your apology. I'm sorry, too.
Re: "I was also a bit frustrated at the prospect of having yet another public discussion be derailed by the issue, when it's clearly just between us": I don't follow. It seems to me that the only way that it can be "just between us" is if absolutely no one else cares one way or the other; and if that were the case, then public discussion would be both unnecessary and impossible.
Re: "As far as I'm aware, you prefer certain code templates to have prefixes so that there is a technical barrier for their usage, and only templates that have been explicitly coded around that barrier will accept such codes": This is not true. I actually really hate the templates that try to "code around" this — {{langprefix}} and so on.
Re: "I also think that you want those prefixes to be used so that Wiktionary users realise that they are not 'normal' codes, and will hopefully act accordingly when using them. I remember you saying something like 'different things should look different'": Yes.
Re: first half of your third paragraph, from "Now, with Lua, we" to "prefixes embedded in the code string": This section presupposes that I want a technical barrier. Now that I've clarified that I don't want that, I think this section is obsolete?
Re: second half of your third paragraph, or more specifically, re: "regular codes don't actually work that differently from reconstructed codes": I think they do. I think that the difference between "languages we include" and "languages we don't include" (such as reconstructed languages) is an absolutely fundamental distinction. (By comparison: I'm sure you wouldn't want {{term|parabola|lang=la}}, {{term|parabola|lang=fr}}, {{term|parabola|lang=es}} to generate “parabola, parole, palabra” on the grounds that the only difference between the three words is the location of the entry for them.)
To put this another way — if you really think that reconstructed languages should be treated just like regular languages, then you should propose that they be put in mainspace. Or if you really think that they belong in appendices, then we shouldn't be treating them like regular languages in all these other respects. But I think that this half-measure — putting them in appendices, but then pretending that these appendices are just regular entries — is really the worst of both worlds.
Re: your last paragraph: I mostly agree, except that I reach the opposite conclusion: this is exactly why these prefixes need to appear directly in the wikitext, rather than forcing templates like {{poscatboiler}} to add special logic to hack around the missing prefixes.
Ruakh 21:30, 16 February 2013 (UTC)
Ok, I think I understand, but I don't quite agree. What exactly is there to gain from making editors so aware of the distinction? I mean, is there ever a problem when they're not aware, given that it's pretty clear through other means that we treat certain languages differently? Do we really need to make it more obvious that Proto-Germanic belongs in an appendix? And why do you think that adding "proto:" to a language code will somehow enable editors to make that connection? To illustrate this: several editors have, in the past, added {{head}} to reconstructed entries. That is the kind of situation where I think that {{head}} should just work. People expect it to work, and don't understand why it doesn't. I highly doubt that putting a prefix on the code will make even the slightest difference; after all, the editors in question were already very much aware that the language was different, because it was in the appendix namespace and had a different name. Yet that fact apparently did not prevent them from concluding that {{head}} should work there as it does in mainspace. So given that reality, adding "proto:" to the code seems like nothing more than pointless bureaucracy, which really does not help in the slightest with what you intend it to do. I agree with you that {{langprefix}} was a bad idea, but it was created because there was a need for it. That need would not have existed if we had decided back then to drop the prefixes from the templates. So, perhaps ironically, {{langprefix}} would not have existed today had you conceded then. —CodeCat 04:59, 17 February 2013 (UTC)
I'm not suggesting that we add proto: to the language code: it's already there. (See e.g. {{proto:gem-pro}}.) I'm suggesting that we not remove it. (And I'm not sure what you mean by "pointless bureaucracy", anyway, since the assigning codes is inherently an exercise in bureaucracy, and there's no difference bureaucracy-wise between assigning codes like proto:gem vs. gem-pro vs. Proto-Germanic.) And you're right about {{langprefix}}: if I hadn't insisted on making a distinction, or if you hadn't insisted on making no distinction, or if Daniel hadn't insisted on making all templates as complicated as possible, then we wouldn't have ended up with the worst-of-all-worlds situation that we have now. —Ruakh 15:40, 17 February 2013 (UTC)
Well, considering that we'd still want to mark our text HTML-wise as Proto-Germanic, we'd necessarily have to use something like "proto:gem-pro". Being HTML-correct is really the main reason why we don't use the prefixes as part of the "proper" code, and the consequence is that the "proto:" that is part of the language template's name becomes nothing more than a technical barrier, which in turn necessitated {{langprefix}}. I think we both seem to agree that it's not a good thing. So which options do we have?
  • Keep as it is. That seems like the most complicated option to me, because prefixed template names don't easily translate to prefixed names in Lua. Unless we somehow decide to store the codes with the prefixes, and add them each time we want to look them up (essentially, Lua-fy {{langprefix}} along with the rest)? That seems rather cumbersome, and like I mentioned, we could just decide to use different tables for reconstructed languages to the same effect, if this is what we want. Nevertheless, if we decide that the method for accessing codes is going to be different for different types, then some analogue to {{langprefix}} is going to be inevitable.
  • Remove the prefixes entirely (use just "gem-pro" everywhere). This will probably be the easiest to implement, because we already use prefixless codes in all of our entries. This option allows us to get rid of any nastiness that prefixes involve, and has the benefit that the code that is typed in articles is the same that will appear in the HTML lang= attribute.
  • Add the prefixes as part of the canonical code (use just "proto:gem-pro" everywhere). This will require a bot to update all the uses. This makes the prefix somewhat redundant because the code already ends with -pro, but that will not apply to, for example, Klingon (conl:tlh in this scheme). It does have the advantage (to you at least) that it's explicit that the code is "different". However, a disadvantage is that any code that creates HTML (which in practice will be most if not all) will have to remove the prefix before putting it in the lang= attribute. Essentially then, we end up with a kind of "reverse langprefix", except one that will need to be used far more often than langprefix itself currently is.
  • Some other option?
CodeCat 17:30, 17 February 2013 (UTC)
lang="gem-pro" is not valid, so we actually don't (or at least, shouldn't) want that in our HTML. —Ruakh 01:39, 18 February 2013 (UTC)
But then neither should any of our other exceptional codes. Does HTML actually mandate the use of ISO-639 as part of its standard? And if so, what does it say that should be done with content that is not in an ISO-recognised language? —CodeCat 02:25, 18 February 2013 (UTC)
Re: first sentence: Yes, I absolutely agree. (This also applies, in a slightly different way, to a handful of exceptional codes used by WMF in general, rather than en.wikt specifically.)   Re: second sentence: Not exactly ISO 639, but lang="gem-pro" isn't valid. What the spec requires is that the value of lang="…" "be a valid BCP 47 language tag, or the empty string", where the empty string means "the primary language is unknown".[1] A BCP 47 language tag is not the same as ISO 639 language code, though they're related, and have a lot of overlap. (For example: es "Spanish" is both; es-ES "Spanish Spanish" is a valid BCP 47 language tag but not an ISO 639 language code (the es subtag is an ISO 639 language code, the ES subtag is an ISO 3166 country code); and spa "Spanish" is a valid ISO 639 language code but not a valid BCP 47 language tag (because BCP 47 requires that two-letter codes be preferred to their three-letter synonyms).) For a friendly-but-thorough guide to BCP 47, see and   Re: third sentence: It really depends. In the case of gem-pro, we can keep the gem subtag, but the pro subtag is invalid (it means Old Provençal, and can only be used as the first subtag of a language tag). We might write lang="gem" (perfectly valid, but vague: it means "some Germanic language"), or we might write something like lang="gem-x-proto" (also valid: x means "private use", and everything after it merely has to be syntactically valid, since its semantics are defined by private agreement). Obviously neither of these is ideal, but I think we're unlikely to find any better approach. (You can read the three pages I linked to, and form your own opinions.) —Ruakh 03:18, 18 February 2013 (UTC)
I remember reading something about x- but I wasn't sure what it would be needed for. As I understand it right now, a tag is split into parts with hyphens as separators. When a part is "x" it means "everything following is nonstandard". So the x-pro part means "do not try to interpret 'pro' as you normally would". If I'm not mistaken, "x-gem-pro" would also be valid? But browsers would not be able to understand from such a code that it is Germanic, whereas with "gem-x-pro" they would parse "gem". Is that correct? —CodeCat 03:41, 18 February 2013 (UTC)
Yes, that's correct. (Well, except the word "parse". Technically even the stuff after x is still parsed, so it still has to be syntactically valid. Something like lang="gem-x-proto_germanic" would not be valid. But I assume you're just using the word "parse" colloquially, and I shouldn't read too much into it?) —Ruakh 04:01, 18 February 2013 (UTC)
Yes sorry, I meant it more as that it doesn't try to interpret the meaning of what it reads. In any case, if that is how it is, then I think we should change the codes we currently use so that they match the standard. It seems a bit hypocritical that we worry about other parts of the lang= attributes but ignore this. I think the easiest way would be to insert -x- in all the exceptional codes, but we could also change -pro to -proto if that is allowed. We could, as an exception, decide to leave out that part in template names, so that the old name is still used for naming templates (out of convenience), like {{gem-verb}} or {{ine-noun}}. Going back to your desire to make proto-languages appear distinctive... do you think that it's distinctive enough if the code ends in "-pro(to)"? —CodeCat 04:14, 18 February 2013 (UTC)
Yeah, {{context|…|lang=proto:gem-x-proto}} might be overkill. Personally I'd prefer {{context|…|lang=proto:gem}}, but I think I can accept {{context|…|lang=gem-x-proto}} as a compromise. (It's not ideal from my standpoint, because the -x- is really there because Ethnologue doesn't include the language, rather than because we don't — we might end up having -x- in some languages we allow, and lacking it in some languages we don't — but I think I can accept it.) —Ruakh 04:33, 18 February 2013 (UTC)
So you would prefer -proto over -pro then? What would be the next step for this change? We'd probably want to discuss it more widely first... —CodeCat 04:39, 18 February 2013 (UTC)
Yes, I'd prefer -proto over -pro. (My impression is that the only reason -pro was introduced is that someone thought it was more standard to use a three-letter subtag. Which is sort-of true — real extension subtags are three letters — but pro isn't and can't be a real extension subtag, so it's nonstandard either way.) —Ruakh 04:42, 18 February 2013 (UTC)


Would appreciate some {{attention}} here if you have a sec…. Ƿidsiþ 07:36, 16 February 2013 (UTC)

Done. —Ruakh 21:31, 16 February 2013 (UTC)

Cheers. Ƿidsiþ 21:51, 16 February 2013 (UTC)

Another one:

! Ƿidsiþ 15:29, 19 February 2013 (UTC)

Whitelisting pagesEdit

Happy Purim. I'm almost positive you know how to whitelist pages; so do I in general, but this case is complicated enough that I'm afraid of getting it wrong and wonder if I can bother you, please, to handle it (if, of course, you agree). The relevant discussion was 'closed' here and here's a handy link to the JS.​—msh210 (talk) 05:37, 24 February 2013 (UTC)

Um, can somebody please create an entry for פורים? TIA —Μετάknowledgediscuss/deeds 06:52, 24 February 2013 (UTC)
Done.​—msh210 (talk) 07:33, 24 February 2013 (UTC)
Thanks! —Μετάknowledgediscuss/deeds 15:44, 24 February 2013 (UTC)

Name of Module:he-utilitiesEdit

I had already created Module:sl-common for a similar purpose. It's probably better to use the same names, so which should we use? —CodeCat 01:37, 25 February 2013 (UTC)

Let me preface this by saying that I think we're in a bit of a discovery-and-experimentation phase, so for things that don't affect anything and are easily changed, it might not be worth worrying too much about consistency just yet. I mean, it's nice for different languages' modules to have similar names, so they're easier to discover and remember, but it doesn't really matter if they don't. Note that template-names are not always consistent between languages, and it's never really caused a problem; the difference in behaviors of different languages' templates has always dwarfed the difference in names. And while it's a bit ugly to do so, we can always create what might be called "shims" or "pseudo-redirects", e.g. creating Module:he-common as return require('Module:he-utilities'). (Disclaimer: not tested.) But I don't actually object to consistency, of course, and since you've raised the point, I'll reply.
Re: "common" vs. "utilities": I have no preference. To me the names imply slightly different things, but both seem applicable here. If you prefer "common", feel free to move Module:he-utilities accordingly.
Ruakh 03:34, 25 February 2013 (UTC)

Red (well, black) links in Latvian inflection tablesEdit

You had asked me about a month ago to add the Latvian words for certain grammatical notions that are mentioned in Latvian inflection tables. I have just finished doing that, and I've also crossed out the respective items at User:DTLHS/WantedPages. I just wanted to let you know. --Pereru (talk) 21:00, 25 February 2013 (UTC)

Thanks! —Ruakh 07:21, 26 February 2013 (UTC)

Lua: Calling a function through a string that has its nameEdit

In Module:nl-verb, the export.conjugate function has what is basically a switch statement that "forwards" the call to the correct function. But that could be written more neatly if I could just tell it to call "conjugate_" .. conj_type as though it were a function. In other words, to call a function through a string with its name (which you can construct dynamically). Do you know if it's possible to do this? —CodeCat 21:46, 27 February 2013 (UTC)

I don't believe there's any way to do exactly what you describe — AFAIK locally-scoped identifiers are only available statically — but it's easy to do approximately what you describe; I've edited Module:nl-verb to show what I mean. Actually, you were already most of the way there. —Ruakh 05:07, 28 February 2013 (UTC)
It should be possible in theory, because you can already put functions into tables, so maybe there is a table that stores all globals or something like that? I remember in PHP you could do this quite easily. —CodeCat 14:51, 28 February 2013 (UTC)
There is a table that stores all globals — it's _env — but due to the nature of lexical scope, local variables are a different beast entirely. (But why does it matter?) —Ruakh 15:36, 28 February 2013 (UTC)
Well, before you changed it, those functions were global, so they could have been called through that table without making one myself. On the other hand, for "security" reasons it might be better to explicitly restrict the list of callable functions (so that someone doesn't try to invoke, say, the function "make_table"). —CodeCat 15:47, 28 February 2013 (UTC)
Re: "those functions were global": Oh my gosh, you're right. I had assumed that function declared local identifiers, like in JavaScript, but it doesn't. That explains why you were able to call functions before their declarations.
I think it goes without saying that we should avoid global variables, including global functions.
Ruakh 15:57, 28 February 2013 (UTC)
I don't think that makes sense. Being able to order functions the way we want, without having to worry about declaring them in advance, is a very good thing. Functions are global in most other languages, too. Then again, I wonder what significance "global" has in this case. If a function is global in one module, can it be called globally from another module that imports it? —CodeCat 16:11, 28 February 2013 (UTC)
That is a fascinating question. I had simply taken for granted that "global" means "global" — why else, for example, would we be writing local p = {} in all our modules — but you are quite right to ask it, because as far as I can tell by testing, global variables are actually not shared between modules. Not only does a module not see the globals of a module that it imports, but what's more, even the MediaWiki-provided globals are not really shared. For example, every module has mw, but setting in one module does not affect other modules that import it. (Incidentally, the same is true of the debug console: it doesn't see globals created within the body of the module you're debugging.) So, yeah, disregard my previous statement: we can use "globals" all we want.
By the way, I mentioned _env above, but I was misreading the documentation: it's actually _G.
Ruakh 03:56, 1 March 2013 (UTC)

March 2013Edit

Something funnyEdit

Lua's syntax requires that functions have names that are identifiers, but since functions can also be put in tables, you can really name them anything. I discovered that that includes:

local export = {}

export[""] = function(frame)
-- ...

return export

I thought that was kind of funny, since I don't know many other languages that let you do this. —CodeCat 02:34, 2 March 2013 (UTC)

Maybe I'm missing what you're getting at, but it seems to me that the same is true in most scripting languages, including Perl ($foo{''} = sub { ... }), JavaScript (window[''] = function () { ... }), and Python (foo[''] = lambda : ...). You'll find this in any language that offers associative arrays and first-class functions. —Ruakh 07:04, 2 March 2013 (UTC)

How do you think I should do this?Edit

I would like to work on converting most uses of {{head}} in Dutch to templates specific to Dutch (which are Lua-fied and faster). But many of them would essentially be the same, because for most of them only a headword is actually needed, nothing else. It definitely makes sense for there to be a single Lua function that does all of them (with a parameter to specify which PoS to categorise in). But I'm not sure what to do on the template side of things. The current approach taken by most languages is to have a single template for each PoS, so with Lua that would mean having many templates all invoke the same Lua function, with a parameter to specify the PoS. An alternative approach is to create {{nl-head}} and give it a parameter that is then passed onto Lua. Basically, it would move the PoS-parameter from being in the templates to being in the entries themselves. However, there is the danger that some editors will see {{nl-head|preposition}} and think, hey, I can probably write {{nl-head|noun}} too! And that's something we definitely don't want. —CodeCat 15:40, 5 March 2013 (UTC)

Re: first sentence: We really need to Lua-ify {{head}}, too. For obvious reasons, I think it'll be a while before we figure out languages well enough that we can really Lua-ify {{head}} properly, but I don't think there's much benefit to moving away from {{head}} in the meanwhile (at least for that reason). Besides, unless there are ==Dutch== entries with large numbers of POS sections, the Dutch-specific templates aren't "faster" in any meaningful sense. (Is a page that takes 18.1ms "faster" than a different page that also takes 18.1ms?)
Re: {{nl-head|noun}}: If that's really something that we never want, then I don't think there's really problem, since {{nl-head}} can simply add a cleanup category — or simply call the function for {{nl-noun}}, which will add a cleanup category due to missing arguments.
Ruakh 16:41, 5 March 2013 (UTC)
You haven't really answered my question though. I am wondering whether having separate templates like {{nl-prep}}, {{nl-interj}} the way we current do is preferable to having a single {{nl-head}} with a parameter. I do prefer it for consistency reasons, but I thought that some people might not like it because it leads to creating a separate template for every PoS (even if the Lua code behind each is shared). —CodeCat 18:06, 5 March 2013 (UTC)
I have no preference. This question seems better-suited to Wiktionary talk:About Dutch. —Ruakh 06:13, 6 March 2013 (UTC)

Memory Stick and memory stickEdit

Hi there. I wasn't aware that removal of (supposed) definitions like that required verification in that way (although it makes sense, I'm simply not hugely familiar with Wiktionary rules; I have spent a lot more time editing Wikipedia and am more used to the burden of proof being the other way round). Thanks for pointing me to rfv-sense. Alphathon (talk) 06:55, 7 March 2013 (UTC)

The burden of proof is still the same way 'round: the sense will be removed if no one presents evidence for it. (You don't have to present evidence for its nonexistence, or anything like that.) It's just that we prefer to leave the content in-place, with the warning tag, while the initial discussion is going on. —Ruakh 07:04, 7 March 2013 (UTC)

Split comma-separated gendersEdit

I am not sure if that is a good idea. Besides making the module slower, it would also end up enabling that behaviour for all templates, even though most of them would probably not need it. I would prefer it if the calling module would perform the split, rather than the gender-and-number module. —CodeCat 16:20, 8 March 2013 (UTC)

I figured that most templates would need it, and that it was best to handle it consistently in the ideal way, rather than having some templates split on comma, some split on space, some that support only a single gender/number specification, and so on. (Incidentally, if you want to change it to accept only commas or only spaces, I'd be down with that. I wasn't sure which was better, but supporting both is probably actually not good, because then people will be inclined to use a comma followed by a space, and the module doesn't currently handle empty specifications intelligently. Alternatively, of course, we could change the module to handle empty specifications intelligently.) The major group of templates that don't need it are the ones that completely generate their gender/number information internally (rather than taking it from template parameters), and of course, such templates can simply ignore this feature. I don't buy the "making the module slower" argument, because the module always calls split at least once anyway, so even if split were this incredibly expensive function that dwarfed all other aspects of the module, this would still only be a less-than-factor-of-two slowdown. —Ruakh 03:38, 9 March 2013 (UTC)
The slowdown for a single call will not be terribly significant, but this function may be called hundreds of times on a page because of the genders in translation tables. So every small amount can easily multiply, and it's probably a good idea to remove anything that isn't strictly necessary. I don't know which templates would actually need it, though. Can you give an example of a case where multiple genders can't be passed in as a list? —CodeCat 10:13, 9 March 2013 (UTC)
Templates don't support lists. (BTW, the Lua term is actually "sequence", but I'll use your terminology for this comment.) So any Luified template that accepts multiple genders from the user will have to offer some non-list mechanism for doing so. One approach is to have a series of separate parameters (say, 1 and g2 and g3) and then assemble them into a list. Another is to have a single parameter and split it into a list. I think the latter is clearly superior from the user's standpoint.
Re: "it's probably a good idea to remove anything that isn't strictly necessary": Fortunately, it's obvious that you don't actually believe that, because if you did, then the module would only contain export.format_single and export.COMMA (the latter being "'', ''"), and calling modules would assemble their genders into a string rather than into a list. Instead, you made an effort at encapsulation, at exposing only a single function, export.format, for software-engineering reasons. I think that's fine. But it is clearly inferior from a performance standpoint, and the only reason to do it is if we care about humans.
Ruakh 19:14, 9 March 2013 (UTC)
Ok, I understand that part. But the main module doesn't have to support the list-splitting itself. For example, take {{es-noun}} as an example, which accepts "mf" as a gender. There is nothing wrong with that, but it is incompatible with both the new module and the old templates. Consequently, the template has to convert the gender information into a new format, through a conditional which then forwards it onto the templates {{m|f}}. What I am proposing is to allow each individual template to specify in its own terms how multiple genders are to be indicated, and to expose a single interface on the Lua side, using a table of strings. An example would be Module:nl-head, which has a g2= parameter. If a template decides to combine multiple genders into one parameter, then it also carries the responsibility of splitting/interpreting its parameter before passing it on to Module:gender and number. So basically, I am arguing that splitting on commas/spaces should not happen in Module:gender and number, but in the modules that call it. Of course, if you think that we should rather get rid of multiple parameters for genders (therefore, remove the g2= parameter) and use a single string that contains all information encoded within it in some format, then that's different. But I'm not sure what benefits there would be in such an approach. The advantage to forcing each module to take "responsibility" for the split itself is that it is able to analyse the genders itself and perhaps add categories. For example, both {{nl-noun}} (through Module:nl-head) and {{sl-noun}} check to see whether the gender is correct; such a check would be more difficult if the whole multi-gender parameter is passed verbatim to Module:gender and number, and would probably mean that the calling module has to split the string anyway to get the information from it, which somewhat defeats the purpose of deciding to let Module:gender and number perform the split. —CodeCat 19:35, 9 March 2013 (UTC)
Re: {{es-noun|mf}}: If we want to keep these sorts of ad hoc notations, then fine, but it would be better to write {{es-noun|m,f}}. This way it's easy for all templates to do it the same way — a way that's (hopefully1) easy for users to remember.
Re: g2=: Absolutely I think this is a bad user interface, incredibly inconsistent between otherwise-identical templates. We were restricted to these sorts of hacks when we were tacking multiple-gender support onto a system that already supported a single gender, but we aren't anymore! (Also, BTW, if you're going to be all microoptimization-obsessed beyond anything that could conceivably be measured, then I believe you should prefer splitting in Lua over templates that take multiple parameters.)
Re: {{nl-noun}} and {{sl-noun}}: If they perform the split anyway, then there's nothing to discuss; they can pass the resulting table into export.format, exactly as you'd already planned. (Note that your argument applies just as well to your existing list approach: both {{nl-noun}} and {{sl-noun}} have to loop over the gender specifications to validate them, which defeats the purpose of letting Module:gender and number handle the looping!)
1. Speaking of users, we should ask them about this. I mean, I already know what Stephen will say, and if you start the discussion then I already know what DCDuring will say, but we should ask normal editors, too.
Ruakh 20:12, 9 March 2013 (UTC)
In that case, I think supporting a single gender parameter for all languages is a good idea. But I love to be nitpicky so I am not sure if I like separating them with commas. How about "m/f" instead of "m, f"? Keep in mind that while it may be nice to have the gender code entered the same way as it's displayed, there's no guarantee that the current module will always display "m, f". Maybe we will decide someday that we prefer "m or f" instead. —CodeCat 20:21, 9 March 2013 (UTC)
Yeah, I'm not married to commas. One thing that I don't like about my single-parameter approach is that it essentially creates a mini-language with two infix operators, so it needs to be obvious at a glance which operator has higher precedence. I'm not sure that commas meet that test: is it obvious that m,f-p means "{m},{f-p}" and not "{m,f}-{p}"? I'm not sure. A better option might be semicolons: m;f-p, maybe? Or maybe it's really not possible without spaces: m f-p or m, f-p or m; f-p or whatnot. One advantage of commas, of course, is that as long as we do display them with commas, the commas will be the easiest operator to remember. —Ruakh 21:21, 9 March 2013 (UTC)
I don't think it has to be obvious, as long as it's consistent. —CodeCat 22:52, 9 March 2013 (UTC)
(I'm a bit late to the party, I guess, and haven't even looked at the code y'all're discussing, so am commenting based only one what I've gleaned from the discussion here (and what minimal intelligence I can lay claim to).) How about m;f,p to code "masculine; feminine plural"? I think that's easy for non-coders to remember, as it sort-of matches normal English usage. Even better, how about m,fp or m;fp — but only if concatenated without delimiters is possible, which I don't know.​—msh210 (talk) 04:26, 10 March 2013 (UTC)
Re: concatenated without delimiters: That would be awkward, because not all the templates in Category:Gender and number templates have single-letter names. I don't see any truly ambiguous potential sequences, but it would still be icky. (Also, that category doesn't contain all possible codes. {{pf.}} and {{impf}} belong to essentially the same class, and CodeCat is now hoping to introduce an, in, and pr.)   But comma-and-semicolon seems fine to me. —Ruakh 07:02, 10 March 2013 (UTC)
Perfect and imperfect can be separated from the others, because they are used for a different part of speech. At least, I'm not aware of any verb that has gender. Verb forms may, but we indicate that in the form-of definition rather than on the headword line, and no verb lemma has a single gender as far as I know. —CodeCat 13:30, 10 March 2013 (UTC)
It's true that we're unlikely to combine {{pf.}} ("perfective") or {{impf}} ("imperfective") with {{m}} or {{p}}, but we probably want the same module to handle them, for a few reasons:
  1. {{t|xx|foobar|m}} and {{t|xx|foobar|impf}} should both work, and {{t}} shouldn't need to examine its argument to try to decipher what module handles it.
  2. we want user-input to be handled analogously; if m;f means "masculine or feminine", and pf and impf mean "perfective" and "imperfective" (respectively), then pf;impf should mean "perfective or imperfective").
  3. we want presentation to be analogous.
Incidentally, your pr code ("personal") suggests a point of possible overlap between noun classes and verb classes, though that doesn't really matter in and of itself.
Ruakh 23:24, 10 March 2013 (UTC)

Final-removing Lua functionEdit

Do we have a Lua function that takes a single Hebrew-script word, and if it contains no finals spits it back out, but converts finals to their medial forms? I could use one for Yiddish, and I imagine it could be quite helpful for Hebrew as well. Please note that the requirements will be slightly different, though, because in Yiddish the medial form of ף is פֿ. Thanks! —Μετάknowledgediscuss/deeds 03:47, 15 March 2013 (UTC)

I've now created Module:yi-utilities with such a function. You might also look through Module:he-utilities and see if there's anything there that you want to appropriate. (It does have a function to convert an individual letter from medial-or-final to medial, but not one that accepts an entire word. I guess there's no reason it couldn't.) —Ruakh 17:21, 15 March 2013 (UTC)
Excellent! Thanks! —Μετάknowledgediscuss/deeds 19:38, 15 March 2013 (UTC)

Some advice?Edit

I'm working on Module:ca-head, and I have a question about the make_plural function. The way it should work is like this: each replacement is tried in sequence, and as soon as a replacement is made, the result is returned. It works already, but it seems like a rather bad way to do it because each possibility has to be matched twice: first to see if it's in the string, and then again to do the actual replacement. Would you know if a more elegant way to do this, which automatically "aborts" all remaining possibilities once a successful replacement is made? —CodeCat 21:30, 20 March 2013 (UTC)

I think "elegant" is subjective, but probably the tersest approach is to write a helper function that handles an arbitrary number of non-cascading substitutions — maybe something like this:
function ending_swapper(base, ...)
    local swaps = { ... }
    local num_swaps = # swaps
    for i = 1, num_swaps, 2 do
        local ret, n = mw.ustring.gsub(base, swaps[i] .. '$', swaps[i+1])
        if n > 0 then
            return ret
    return nil
and then use it something like this:
function make_plural(base, gender)
    local ret = ending_swapper(base, "ça","ces", "ca","ques", "qua","qües", "ja","ges", "ga","gues", "gua","gües", "a","es")
    if ret then return ret end
    ret = ending_swapper(base, "à","ans", "[èé]","ens", "([gq])uí","%1uins", "([aeiou])í","%1ïns", "í","ins", "[òó]","ons", "ú","uns")
    if ret then return ret end
    if gender:find("^mf?$") then
        ret = ending_swapper(base, "às","asos", "[èé]","esos", "([gq])uís","%1uisos", "([aeiou])ís","%1ïsos", "ís","isos", "[òó]s","osos", "ú","usos", "[çsxz]","%0os")
        if ret then return ret end
        if base:find("sc$") or base:find("st$") or base:find("xt$") then return base .. "s", base .. "os" end
    if gender == "f" then
        if base:find("s$") then return base end
        if base:find("sc$") or base:find("st$") or base:find("xt$") then return base .. "s", base .. "es" end
    return base .. "s"
. . . which isn't an all-or-nothing deal. For example, you could take the concept of having a helper function that calls gsub and that returns nil when there's no match, but instead of taking many arguments at once, you could chain the calls like return h(base, "ça", "ces") or h(base, "ca", "ques") or ... or (gender:find("^mf?$") and (h(base, "às" ,"asos") or h(base, "[èé]", "esos") or ...)) or .... Or whatever.
Ruakh 06:52, 21 March 2013 (UTC)
It does look like an ok solution, not the clearest one though. Terseness is nice but it shouldn't be detrimental to code clarity. I do like your or-solution though... that kind of fits my idea of "elegant" because it uses the language's own idioms. I'll see what I can do. Thank you. —CodeCat 13:42, 21 March 2013 (UTC)

April 2013Edit

A request for your inputEdit

Can you have a look at Module talk:ru-translit#How can this be used from another Lua module?? —CodeCat 12:57, 11 April 2013 (UTC)

technical questionEdit

Hi. Could me tell what is the parameter that was added to an url, for example, to display name of MediaWiki messages instead of normal text, like (PAGETITLE) instead of its title etc. I can't remember and I can't find it here. Maro 23:55, 14 April 2013 (UTC) (uselang=... specifies an interface language; for example, you can use uselang=pl to view the interface in Polish, though of course then you lose the benefit of all our nicely customized English messages with helpful links. qqx is a "private use" code — it will never be assigned to a real language — and the uselang feature uses it for the purpose that you describe.) —Ruakh 04:36, 15 April 2013 (UTC)

Your name came upEdit IRC discussion regarding the DICT project. Just sayin'... Would you have an interest if things started to move forward on this one? - Amgine/ t·e 15:47, 16 April 2013 (UTC)

If you are interested in mentoring someone on this, we could move the project to one of those that is "Featured" and thus more likely to get a student interested. -- MarkAHershberger(talk) 18:07, 16 April 2013 (UTC)
It sounds like a valuable project, so I hope someone steps up, but I'm not sure that someone should be me. What exactly is involved in being a mentor? Especially — how much time would I be expected to commit? A GSOC is a big deal for a student — it's a lot like an internship — so it would really be unfair to him/her if I (or anyone) agreed to mentor but didn't actually commit the necessary time. (I imagine it could also potentially damage Wiktionary's ability to get GSOC students in the future.) I recently started working at the world's largest online retailer, and I love it, but the rumors are really true about the incredibly long hours that developers put in. I have way, way less spare time now than I used to. :-P   —Ruakh 06:38, 17 April 2013 (UTC)
Congrats on landing the job.​—msh210 (talk) 17:53, 19 April 2013 (UTC)
Thanks! —Ruakh 17:09, 20 April 2013 (UTC)
Yes, my congratulations as well!
As for the mentoring schtick... While I would love for you to be able to work on the DICT project, we don't have a victstudent working on that one. On the other hand, a much lower-time-cost project *does* need a community liaison: Bugzilla and GSOC application draft. This project is to build a pronunciation recording tool for Wiktionary, so we can have a simple method for our readers to contribute a recording of a word pronunciation. There is a WMF developer to do the software-side mentoring, but the student needs someone who is a regular part of the Wiktionary GP community to advise and go back-and-forth to the Grease Pit regulars for input/reporting. I think it should be about a half-hour or so per week, if the application is approved, mostly e-mailing. - Amgine/ t·e 15:28, 29 April 2013 (UTC)



I don't know if this tool can become really useful but it can definitely get better. Could you check, add missing letters (if any) and transliterate the diacritics, please? If the pronunciation differs depending on the position, could you put a short comment, please? Module:ar-translit is a bit more advanced but it can't do a perfect job for the obvious reasons, e.g. اَلْلُغَةُ ٱلْعَرَبِيَّةُ: al-luḡatu l-ʿarabiyyatu

I can't read Hebrew but it may be easier for me to see what letters are used with the tool. E.g. the call on the module with מִפְעָל currently produces: Lua error: bad argument #1 to 'toNFD' (string expected, got table). What should it be? Do you think it's possible to transliterate fully vocalised Hebrew in a more or less accurate way? --Anatoli (обсудить/вклад) 23:03, 17 April 2013 (UTC)

Since there's no way for it to distinguish between a kamatz gadol (transliterated as "a" on Wiktionary according to WT:AHE) and a kamatz katan ("o"), the automated transliterations would not be perfect, so it would probably be preferable to fill them in manually, no? --Yair rand (talk) 23:29, 17 April 2013 (UTC)
Of course, the manual override is preferable, if there are ambiguities but could there be default values, "a" and "o"? Another option is to follow Persian and Arabic and put "(a/o)", "(e/ei)" in brackets with a slash, so that people know they have to decide, which one is right like I did with ["פ"]='(p/f)', ["ף"]='(p/f)'. --Anatoli (обсудить/вклад) 23:45, 17 April 2013 (UTC)
If the nikudot and such are included, then those particular letters don't need to be left ambiguous: פ and ף are "f", and פּ and ףּ are "p". However, the current code seems to only allow for single letter transliteration points... --Yair rand (talk) 23:56, 17 April 2013 (UTC)
If you know Lua and Hebrew, you could fix it or make it better. I just thought we need a tool for Hebrew and set up a really basic structure, so that a working module could be created. --Anatoli (обсудить/вклад) 00:02, 18 April 2013 (UTC)


I know you're really busy, I was just hoping that you could run Rukhabot a bit more. It's been more than 3 weeks since the last run, and we are slowly becoming somewhat dependent on bots like these. Thank you, and however much time your job is eating away, I hope you're enjoying it! —Μετάknowledgediscuss/deeds 05:47, 24 April 2013 (UTC)

Which one? (I assume you mean either interwikis or trans-links?) —Ruakh 02:51, 25 April 2013 (UTC)
Both are good, but trans-links are, AFAIK, the sole domain of Rukhabot, and thus more important. —Μετάknowledgediscuss/deeds 03:05, 25 April 2013 (UTC)
Thanks for this, as well as for drastically improving the layout of {{af-personal pronouns}}. I wouldn't have thought to arrange it thus, but I do believe it looks better now. —Μετάknowledgediscuss/deeds 01:41, 3 May 2013 (UTC)

Links to Hebrew-script termsEdit

Category:term cleanup/sc=Hebr contains a list of pages that use {{term}} with sc=Hebr but without a language. These should have a language instead of a script, but a bot can't automatically replace them (unlike, say, Gothic script) because there are several language that are written in Hebrew. Could you help? —CodeCat 14:03, 27 April 2013 (UTC)

May 2013Edit

Python questionEdit


I know you're busy, and will perfectly understand your not replying (or replying in the negative) to this, but I have a pywikipediabot question (about regexes. I'm trying to do in Python what a JavaScript script you once wrote does). It's at [2]; any ideas you can provide would be much appreciated.​—msh210 (talk) 05:07, 3 May 2013 (UTC)

Oh, that message mentions That's described at [[mw:Manual:Pywikipediabot/]]: essentially, it's a bunch of so-called fixes, each of which is a hash that specifies a regex replacement and an edit summary.​—msh210 (talk) 05:12, 3 May 2013 (UTC)

Okay, I've got an answer there; I'll try it out (I've no time to now); so, meanwhile, completely ignore the above.​—msh210 (talk) 20:34, 3 May 2013 (UTC)

Forced user renames coming soon for SULEdit

Hi, sorry for writing in English. I'm writing to ask you, as a bureaucrat of this wiki, to translate and review the notification that will be sent to all users, also on this wiki, who will be forced to change their user name on May 27 and will probably need your help with renames. You may also want to help with the pages m:Rename practices and m:Global rename policy. Thank you, Nemo 13:09, 3 May 2013 (UTC)

Your replacement for Template:contextEdit

I came across {{plural}}, which seems rather redundant to me compared to {{p}} - it literally just contains ''plural'', so there is no advantage over just typing that out, same amount of characters. I was looking at the transclusions and noticed that a significant number of them are caused by {{context}}. That made me think about your efforts to replace the template with something else. Now that there is Lua, I presume you'd want to write it in Lua instead. Do you think you could try to do that anytime soon? I could also try if you don't have the time. —CodeCat 19:29, 6 May 2013 (UTC)

We may start editing and/or deleting {{label}} and its related subtemplates soon. But I can imagine that you might want to keep it (in your personal userspace) for some future use, so I'm asking first if it's ok. —CodeCat 17:06, 4 June 2013 (UTC)
The copy in my personal userspace is independent of the one in Template:-space — Liliana copied rather than moving — so you needn't worry. (As for the copy in my userspace, I might keep it, at least for a while, just so as not to break the discussions about it.) —Ruakh 17:46, 4 June 2013 (UTC)

Some errors in Module:ko-translitEdit


Could you please fix the module when you have time. It has errors at the moment (wasn't caused by my last edit, I know it was working after that, it's something else). --Anatoli (обсудить/вклад) 01:06, 8 May 2013 (UTC)

So . . . many . . . talk-page . . . comments. Yours wins for the most interesting debugging problem. Previously, even global variables were not shared between Scribunto modules that import-ed each other, but that must have changed, I guess. Inserting local, so that Module:ko-hangul and Module:ko-translit weren't both messing with the same p, fixed the issue. (The test-cases are still failing, but only because the transliteration scheme implemented by the module differs from the one the test-cases expect. You should have no difficulty fixing that.) —Ruakh 08:33, 11 May 2013 (UTC)
Thank you very much! It will take me some time even to understand what each section does. --Anatoli (обсудить/вклад) 09:05, 11 May 2013 (UTC)

User_talk:Rukhabot#Autotranslit to write transliterations from modules where it's missing?Edit

Hi Ran,

I've posted in User_talk:Rukhabot#Autotranslit to write transliterations from modules where it's missing?. Do you think it's feasible?

Also, Category:Hebrew translations lacking transliteration may need attention of active Hebrew editors. --Anatoli (обсудить/вклад) 01:15, 21 May 2013 (UTC)

June 2013Edit

Some JS helpEdit

Currently, the translation editor substitutes language templates, but it should really substitute a call to Module:language utilities instead. Aside from that, it adds a language name to {{trreq}} but it should really use a code. I tried fixing it but things didn't work right so I reverted it and asked Yair Rand for help. Do you think you could try as well? —CodeCat 20:25, 2 June 2013 (UTC)

Sorry, I spent all my spare Wiktionary coding time today on MediaWiki:Gadget-PatrollingEnhancements.js, which I feel more responsible for. I might have time to look at this later in the week — but it looks like Yair rand has done this now, right? Or is there anything that still needs to be done? —Ruakh 03:28, 3 June 2013 (UTC)



Something happened in the last couple of days. There's no green colour and the tool can't be used. Please fix when you can. --Anatoli (обсудить/вклад) 04:26, 3 June 2013 (UTC)

Yeah, it looks like CodeCat (talkcontribs) changed the structure of {{t}}. Previously, the script template was inside the link-text, but now, the link is inside the script template. It might be a quick fix. —Ruakh 04:33, 3 June 2013 (UTC)
Fixed now, hopefully. (Not tested. In future, please link to an example.) —Ruakh 04:41, 3 June 2013 (UTC)
This seemed like such an uncontroversial change I didn't think it would have any effect. I wonder if our scripts are so sensitive to changes like that, whether they should not be changed to be more robust... —CodeCat 11:56, 3 June 2013 (UTC)
Changing the structure of the HTML is actually a very major change, from a script's standpoint. It's can be minor in that it doesn't affect the browser's rendering, but a script that attempted to examine the rendered output would actually be much more fragile, rather than much more robust. —Ruakh 14:05, 3 June 2013 (UTC)
I know, but I noticed that the scripts select elements based on their parents. Couldn't they select based on a class attribute instead? Then things like this wouldn't happen anymore. And if it has to be this way, please document it on the documentation pages of the translation templates or with comments in their wikicode. This wouldn't have happened if it had been more obvious that something depended on the particular ordering! —CodeCat 14:41, 3 June 2013 (UTC)
We don't control the class attributes on links; they're generated by the software. —Ruakh 14:57, 3 June 2013 (UTC)
It's not possible to write out the link in HTML? —CodeCat 15:35, 3 June 2013 (UTC)
I don't believe so, no, but you're welcome to experiment. (But keep in mind that the script needs to detect whether the linked-to page is a redlink, and we don't want to use {{#ifexist:…}} because there are sometimes a gazillion translations on a single page; so even if you do find some way to write out the link in HTML, I expect that it'll be of more theoretical than practical interest, at least for this use-case.) —Ruakh 02:18, 4 June 2013 (UTC)
Thank you again but there are some problems still, see my last message at User_talk:Ruakh/Tbot.js#The_green_colour_is_gone. --Anatoli (обсудить/вклад) 04:32, 7 June 2013 (UTC)
Not sure if you saw my reply but the problem persists. --Anatoli (обсудить/вклад) 06:31, 7 June 2013 (UTC)

ברוך הבאEdit

The code of headword line of this entry looks... less than appealing. I think it inflects like a regular Hebrew adjective, so do you know if it's possible to use an adjective template for it? —CodeCat 22:55, 23 June 2013 (UTC)

It's a sentence — barúkh (b'rukhá/ím/ót) is the predicate, habá (haba'á/ím/ót) the subject — so I'm not sure it's right to say that it "inflects". But you're clearly not alone in thinking that it inflects like a regular Hebrew adjective, since the current code must have been copied from one of the Hebrew adjective templates. (The giveaway is the "indef", which makes no sense here: adjectives inflect for definiteness, but that makes no sense as a description for a complete sentence. And habá is actually definite.) —Ruakh 06:50, 25 June 2013 (UTC)
So it's ok if I remove "indef" and convert the rest to use {{head}}? —CodeCat 11:38, 25 June 2013 (UTC)
It's fine by me. Though sadly, I don't think {{head}} has good support for the case where the display-text for an inflected form (not sure that's the right term in this case, but you know what I mean) is different from the link-target. —Ruakh 04:58, 27 June 2013 (UTC)

templates don't link boldly to words spelled with apostrophes, hyphens, asterisksEdit

A little birdy (actually, a little kitty) mentioned that you might know how to fix templates that don't link to or embolden links to words spelled with apostrophes, hyphens, or asterisks. If you have the time, could you add your wisdom and/or speculation to this discussion? - -sche (discuss) 05:19, 27 June 2013 (UTC)

User talk:Ruakh/Tbot.js#Gender of nouns missing 2Edit


Please kindly take a look at this. --Anatoli (обсудить/вклад) 05:26, 27 June 2013 (UTC)

July 2013Edit


Are you still creating entries? ;) Would you mind creating the one for Lua error in Module:links/templates at line 49: The parameter "head" is not used by this template.? I'm spending this month researching and expanding the WP article on that prophet, and setting up a Portal on Wikisource as well. It would be nice to have the Classical Hebrew entry for his name as well. --EncycloPetey (talk) 01:54, 10 July 2013 (UTC)

  DoneRuakh 20:36, 13 July 2013 (UTC)


Is this the correct term for biomass? I dug this up out of the Hebrew Wikipedia, and since I'm no expert in Hebrew, I thought I'd ask you if this was the correct term. If it is, could you add it to the translations for biomass? Thanks, Razorflame 01:02, 14 July 2013 (UTC)

  DoneRuakh 02:50, 14 July 2013 (UTC)
Thanks. I figured you'd be the best person to ask since you know Hebrew so well :) Razorflame 02:52, 14 July 2013 (UTC)

Template for words with both prefixes and suffixesEdit

Is there a template for words that have both a prefix and a suffix? Thanks, Razorflame 02:54, 14 July 2013 (UTC)

Keep in mind that just because a word has both a prefix and a suffix, that doesn't mean they were added "simultaneously". For example, unhappiness is better viewed as unhappy + -ness than as un- + happy + -ness; so, it should just use {{prefix}}. But to answer your question — yes, we have {{confix}}. It's mostly used in cases where the entire word is formed from two bound morphemes, e.g. telescope, but it also supports cases where a prefix and a suffix were added "simultaneously" to a single word. —Ruakh 03:18, 14 July 2013 (UTC)
Thanks for the information :) Razorflame 03:21, 14 July 2013 (UTC)

Weird character insertion by RukhabotEdit

It happened in December 2012, so maybe you already know about this, but look at this edit: [3]. —Μετάknowledgediscuss/deeds 05:54, 15 July 2013 (UTC)

Eep, I was not aware of that. Thanks for letting me know.
Several dozen other pages have that character; in most cases it was a human editor who added it somehow, but in about twenty cases, it was the same as [[ironware]]: Rukhabot trying to add 'ø' to an entry that didn't already contain non-ASCII characters. I'll fix the latter group, at least.
Thanks again,
Ruakh 06:21, 16 July 2013 (UTC)
I've now fixed the underlying issue in Rukhabot, by calling utf8::upgrade on the page-text before doing anything to it. —Ruakh 14:26, 27 August 2013 (UTC)

Random entry by languageEdit

This is a pretty involved thing to request, so I don't realistically expect it of you... but Hippietrail is giving up on his Toolserver stuff, and doesn't seem to be willing to migrate it so we can get the random entry by language feature working again. I think your name came up last time, but you were too busy, and presumably still are. Might as well hope, though, since I don't know who else to ask. —Μετάknowledgediscuss/deeds 22:54, 22 July 2013 (UTC)

Sorry, I don't think I'll have time to do that anytime soon. —Ruakh 06:48, 24 July 2013 (UTC)


FYI, Template:t-SOP may be deleted, and Template:t0 may come into use, pursuant to this discussion. - -sche (discuss) 20:16, 23 July 2013 (UTC)

Well actually, {{}} and {{t0}} probably aren't needed anymore either... —CodeCat 20:20, 23 July 2013 (UTC)
Thanks for letting me know. I'll be sure to check for updates to {{t}}'s documentation before the next time I run the translation-bot. —Ruakh 06:48, 24 July 2013 (UTC)

August 2013Edit


I notice you're still using Template:onym. I just wanted to make sure you were still aware that it is being orphaned pursuant to WT:RFDO#Template:onym, and will be deleted soon. Adjustments have been made so that Template:he-onym continues to work as before... although now that modules have enabled templates to automatically deduce what diacritic-less pagenames to link to when diacritical words are put in, it might make sense to modify {{he-onym}} to take advantage of that. (Specifically, I notice that the last of these is currently the only one not to link to גלשן שלג:

גַּלְשַׁן שֶׁלֶג‎, Template:he-onym, גַּלְשַׁן שֶׁלֶג‎, Template:he-onym.)

- -sche (discuss) 05:21, 1 August 2013 (UTC)

Thanks! —Ruakh 14:20, 27 August 2013 (UTC)


Just wanted to thank you for your work on Wiktionary templates. Best wishes. (Incidentally, do you find the Template Sandbox helpful?) Sharihareswara (WMF) (talk) 21:22, 21 August 2013 (UTC)

Raw links and alt forms in Hebrew entriesEdit

Many templates can remove diacritics now, and those that do now add entries to a category if they notice that the entry name parameter is the same as the alt form parameter with the diacritics removed, which makes it redundant. This includes Hebrew, but the Hebrew entries use a rather strange format in many headwords with piped links embedded directly into the template parameter. This makes it hard for me to understand what's going on and fix the situation. Can you have a look? The category for Hebrew is Category:Link alt form tracking/redundant/he, but the software is still updating it and adding more entries to it over time. —CodeCat 20:31, 28 August 2013 (UTC)

I don't understand what Category:Link alt form tracking/needed is for. (Why have you chosen not to document these categories?) —Ruakh 23:07, 29 August 2013 (UTC)
It seemed like a good idea at first, but it probably didn't work out right. —CodeCat 23:14, 29 August 2013 (UTC)
So I should just ignore the 'needed' cat, and focus on 'redundant'? O.K. But before I make any changes, can you confirm that there's consensus for the view that we should remove the "redundant" diacritic-less versions? (I ask because such removals could have unintended effects, e.g. detrimental impact on site search. I don't want to set about making lots of edits to do this, if it hasn't at least been discussed.)
Also, please document these categories. (I almost feel that cleanup categories with non-obvious meanings should be speedied, since they're as likely to confuse and frustrate as to promote whatever cleanup they're intended for.)
Ruakh 18:54, 30 August 2013 (UTC)
Not all cleanup categories are really useful for general editors though. Some are used mostly to feed bots, like these are. In any case, there should be a consensus because why else did we add this feature (of removing diacritics) if we were opposed to using it? That would be a bit silly. Mglovesfun made a request for this in the GP and I am fulfilling it now. —CodeCat 21:57, 30 August 2013 (UTC)
There's a difference between "it is acceptable to omit the diacriticless version" and "it is preferable to omit the diacriticless version". (It also may depend on the language. For languages like Latin, where the diacriticless version is always fully predictable from diacriticked version, it may make sense to always omit the former, whereas for languages like Hebrew, where this is not the case, it may make sense to prefer that both versions always be present for consistency.)
Incidentally, there's also a difference between "we added this feature" and "we had consensus to add this feature", and while I've been assuming the latter is true, the fact is that I don't actually know.
Regardless, I guess you've answered my question: you don't know or care whether your anti-diacriticless-version project has consensus. So, I don't think I'm interested in working on it. Sorry.
Ruakh 03:17, 31 August 2013 (UTC)

September 2013Edit


Wiktionary:Beer parlour/2013/August#Remove macronless forms from Latin links, Thread:User_talk:CodeCat/it-noun problem, User talk:SemperBlotto#Italian nouns, Wiktionary:Grease pit/2013/September#Template:term/t. —CodeCat 21:06, 11 September 2013 (UTC)

Thanks for the links! But most of those discussions don't seem to explicitly indicate that you plan to run a bot-task, or what you intend for it to do. :-/   And worse yet, some of those discussions show that not everyone is on board with the bot-task. (Bot-tasks don't need absolute 100% support — just as policy changes don't — but if you know that someone is already objecting to a bot-task, even if their stated reason is a bad one, you need to be more careful to ensure there's consensus.)
I'm sorry, I don't want to be too bureaucratic about this; I'm sure that you feel that everything you're doing is good, and I'm sure that if you sought consensus, many editors (including me) would support a good chunk of it (though not all). But since you've recently started some massive undertakings that clearly don't have consensus, and don't seem to be acknowledging that that's a problem, I'm not sure how else to address this situation. If you have any better ideas, I'm all ears. You're acting like you have carte blanche to do everything that strikes you as a good idea, and a few of the non-technically-minded editors seem to feel you're being a bit of a bully (doing it because you can get away with it because they don't have the technical know-how to stand in your way). For community reasons, even if not for technical reasons, this is a serious problem that you need to do something about.
Ruakh 22:00, 11 September 2013 (UTC)
Mglovesfun made a request for a bot task and I fulfilled it, nobody objected to it. The same with Semper, and he explicitly assented, Mglovesfun did implicitly as well. The only thing that is still under discussion is the change from {{m}} to {{g|m}}, but the debate there is about how to best fix the occurrances of the template where it should be integrated, not with adding {{g|m}} itself. I think DCDuring was the only one who objected to that in the past, on the grounds that he has to type more. :/ —CodeCat 22:10, 11 September 2013 (UTC)
O.K., let's separate these out:
  • Removing macrons from Latin links (Wiktionary:Beer parlour/2013/August#Remove macronless forms from Latin links) — this is fine, since it was explicitly and unambiguously mentioned beforehand in the BP, and no one objected. (I think you also started making the same change in other languages, which I don't think is fine.) If you could adjust the bot's edit-summaries to link to that BP discussion, I'd appreciate it. (That's not required by any policy, but it will help people understand an otherwise-undocumented edit, and make it clear where they should go to discuss it.)
  • Thread:User_talk:CodeCat/it-noun problem — this doesn't seem to relate to any of your bot edits, it's just a link to people being annoyed with pointless Lua-related changes you were making. I appreciate your honesty in linking to it, though. :-P
  • Changing e.g. {{it-noun|cicloalchen|m|e|i}} to e.g. {{it-noun||m||cicloalcheni}} (User talk:SemperBlotto#Italian nouns) — this is a multistep plan that involves breaking changes to the template, yet it doesn't seem to have been discussed anywhere but SB's talk-page. In particular, it wasn't discussed on the template's talk-page. Further, the current bot-run is changing from the specific format required by Template:it-noun/documentation to an undocumented format. (And honestly, I'm not totally sure how to read SB's response. In the context of the previous discussion, I wonder if it should be read as "I don't care enough to try to argue with you"? But I may be reading too much into it.)
  • Wiktionary:Grease pit/2013/September#Template:term/t — Keφr objected on the grounds that this should be combined with a different task; whether or not you agree with those grounds, it's an objection. A few commenters, while not explicitly writing "I object to this bot run", advocated alternatives to its stated purpose. (The bot run wasn't the actual subject of discussion, anyway, so you can't really require that people voice their objections there.) And you say that DCDuring has objected in the past because he dislikes this change. If you believe there is consensus for this bot-run, why not start a BP discussion to make sure?
Ruakh 23:19, 11 September 2013 (UTC)
The link to my talk page is more about making changes to the internal workings of the module. But those changes would entail changing the parameters as well (because as it stands now, two of the template's four parameters have no purpose and are just for show). That is what I proposed further down. Once I had thought of a way to make the change incremental (so that no entries would be left in a temporarily broken state) I proposed it to Semper and he seemed to be ok with it.
As for the change with {{g}}... I've noticed other editors doing this already too, like Kephir's edit to {{br-noun}}. If he was the main objector, that seems like a fairly clear assent for the change to me... —CodeCat 23:25, 11 September 2013 (UTC)
Re: first paragraph: I see; I'm sorry, my fault, I was completely misreading that thread. But I stand by my other comments about this bot-task; you're making breaking changes to this template, yet you didn't discuss it anywhere appropriate (such as the template's talk-page or Wiktionary talk:About Latin), and you didn't fix the documentation. (Fixing the documentation isn't directly relevant to running a bot — the question is whether the task has consensus, not whether it's documented — but there's a social problem here, and out-of-date documentation is part of that problem. It makes it harder for less-technically-adept editors to know what's going on and understand what [you've decided that] they're supposed to do.)
Re: second paragraph: I wouldn't say Kephir was "the main objector"; he was an objector. But regardless, the fact that he edited a template to use {{g}} does not revoke his right to object to a bot run. (Note, by the way, that he's not being hypocritical here: he gave a specific reason for his objection — he felt that {{g}}-ification should happen together with certain other improvements — and that specific reason does not apply to the edit you mention. If nothing else, you could probably win him over by restricting your bot to other cases where that reason doesn't apply.) (Incidentally, I don't actually agree with his reason for objecting. Yes, it would be better to do {{t}}-ification or {{head}}-ification at the same time; but if {{g}}-ification is a worthy goal in itself — which it isn't IMHO, but if it were — then there wouldn't be much reason to tie it to those other changes. But he's entitled to his view, and if we disagree with it, that's what discussion is for.)
Ruakh 05:15, 12 September 2013 (UTC)

Mewbot and it-nounEdit

Hi there. CodeCat, via Mewbot and following discussion with me, is changing the format of the {{it-noun}} template. I don't think there is any need to block the bot, as it is doing the necessary work to modify most of the Italian nouns. Cheers. SemperBlotto (talk) 21:13, 14 September 2013 (UTC)

Hi, thanks for your message. The thing is, this wasn't discussed or mentioned on any page like Wiktionary:Beer parlour, Wiktionary talk:About Italian, or Template talk:it-noun, as required by the bot policy. (The bot policy makes an exception when "the task is so innocuous that no one could possibly object", but since this task is part of a breaking change to a widespread template, I don't think it qualifies.) CodeCat has apparently decided that this is personal on my side, so is making it personal on her side: rather than viewing discussion/consensus as required by policy, she prefers to view it as my personal whim, and is therefore refusing on principle to engage in it.
Also, incidentally, this isn't really related to the block, but the bot-edits are directly contrary to the documentation at {{it-noun}}. Since CodeCat refuses to update the documentation, I don't suppose you'd be willing to do so? Correction: she's now updated it.
Ruakh 02:28, 15 September 2013 (UTC)

MediaWiki:Common.js changesEdit

I saw this : [4]. Modifying the site common.js seems overkill if it only applies to two people. Why not change your own Common.js? Is there perhaps a technical reason? Dakdada (talk) 08:42, 19 September 2013 (UTC)

The intent is that it be enabled for everyone, but a similar script recently caused lots of problems when we tried to add it, so Liliana and I, and any other admins who choose to add themselves, are giving this one a whirl before turning it on for everyone. —Ruakh 14:20, 19 September 2013 (UTC)


Hi there. The {{also}} template seems to have lost the "hides the pagename if it is one of the parameters" feature. See, as an example, centreboards. SemperBlotto (talk) 10:37, 21 September 2013 (UTC)

It was me. I have been meddling with that template for a while. I added it back now (a kinda hacky way), but why was it there in the first place? Keφr 10:56, 21 September 2013 (UTC)
I'm guessing it was there for convenience in cases when copying-and-pasting a single {{also}} across half a dozen entries. (And perhaps someone was also thinking that we might create subtemplates like {{also/centreboards}} that just called {{also}} with a fixed parameter-list, but if so, that never materialized.) —Ruakh 18:21, 21 September 2013 (UTC)

Deletion of déphlogistiguerEdit

Hello Ruakh. Following the failed RFV, can you also delete these three other forms: déphlogistiguée, déphlogistigués and déphlogistiguées? Thank you. — Xavier, 20:44, 21 September 2013 (UTC)

  C'est fait.Μετάknowledgediscuss/deeds 04:00, 22 September 2013 (UTC)
Oops, thanks. I just went through the ones in the conjugation table; the inflected forms of the past participle didn't even occur to me. —Ruakh 04:54, 22 September 2013 (UTC)
I suspect there are more of these floating around; perhaps someone should try to find them (not sure how). Or we could just set templates like {{plural of}} to categorise if the page is a redlink, which should catch a bunch of them but will also catch some false positives. In either case, a human will have to look through them. —Μετάknowledgediscuss/deeds 05:34, 22 September 2013 (UTC)
Thank you for deleting those entries. I have checked that no other form of this verb has been missed. I agree that when deleting an entry, inflections are easy to overlook and may stay here for a while. That might explain why, a couple of times, I stumbled upon plurals of non existent French entries. However, each time the word looked valid to me. — Xavier, 10:10, 22 September 2013 (UTC)

Module talk:usexEdit

Please respond to Haplology's inquiry. DTLHS (talk) 20:27, 27 September 2013 (UTC)

Done. Thanks for the note. —Ruakh 20:51, 27 September 2013 (UTC)

New Lower Sorbian noun templateEdit

Hi Ruakh—Last January you helped me out by creating {{dsb-noun}}, which works great but seems very complicated and relies on several subpages. After playing around for a while, I've created {{dsb-noun/new}}, which as far as I can tell generates the same output as your {{dsb-noun}} but doesn't require any subpages. Could you take a look at my code and see if it's OK; does it fail to generate anything that yours does? Is it OK if I write over the old version with the new version and delete the subpages? Thanks! —Aɴɢʀ (talk) 11:49, 28 September 2013 (UTC)

The use of subpages was intentional, not because it's necessary, but because it allows the different bits of logic and formatting to be split up in a way that (1) allows each bit to be tested independently and (2) prevents agglomerations like -->{{#if:{{{gen|{{{dual|{{{pl|{{{dim|{{{f|{{{m|}}}}}}}}}}}}}}}}}}| (|}}<!--. If you prefer the latter, then by all means, feel free to use it.
That said, now that we have Scribunto, it would probably make more sense to put this in a simple Lua module. (The logic of the template is quite simple, but it ends up unwieldy in template notation because of having to support all the different cases that a comma is or is not present. In Lua, that's much easier.) Let me know if you want that.
Ruakh 19:13, 28 September 2013 (UTC)
Sure, if there can be a simple Luacized template, so much the better. I know absolutely nothing about Scribunto, so I can't do it myself. —Aɴɢʀ (talk) 21:18, 28 September 2013 (UTC)
I created Module:dsb-headword based on the other headword modules I made so far. It is split into two parts, one of which is specific to nouns and the other is general, so it should be fairly easy to extend the module for other parts of speech too. I replicated {{dsb-noun/new}}, with an extra bit that shows "plurale tantum" instead of the dual/plural forms, if the gender is "p". —CodeCat 21:59, 28 September 2013 (UTC)
I can't understand any of the code in the module, but I wrote {{#invoke:dsb-headword|show|nouns}} into this revision of User:Angr/test template and transcluded it into this revision of one of my sandboxes and it seems to do everything I want it to. If the gender is sent to "d" it does not mark the term as a duale tantum, but that's probably just as well since some words, like starjejšej, have dual and plural forms but no singular forms, so even though they have no singular they're not exactly dualia tantum either. So now, Ruakh, do you have any objections to my replacing the contents of {{dsb-noun}} with {{#invoke:dsb-headword|show|nouns}} and deleting all of its subpages? —Aɴɢʀ (talk) 23:42, 28 September 2013 (UTC)
The original template didn't support dualia tantum, so I didn't add it here either. I noticed though that no distinction is made between masculine, feminine and neuter dual/plural, and also no distinction in animacy. Is that a particular feature of Lower Sorbian? —CodeCat 00:00, 29 September 2013 (UTC)
Frankly I'm not sure there even are any dualia tantum in the language; starjejšej is the closest thing to one I'm aware of, but it has a plural. Gender distinctions are lost in the dual and plural. Animacy is reflected in the declension templates for masculine nouns, but doesn't seem to be relevant for the headword line. Traditional dictionaries don't indicate it; they expect you to figure it out from the semantics. Some words are presumably animate in some senses and inanimate in others (e.g. póbijak which can mean "someone who knocks people down" (animate) or "device for tightening the hoops on a barrel" (inanimate). —Aɴɢʀ (talk) 14:30, 29 September 2013 (UTC)
I added animacy for {{sl-noun}} though, and I don't think it should be left to semantics because those can be deceptive. In Slovene for example, chess pieces are animate, which isn't really all that obvious. You also have to keep in mind that genders are "agreement classes" and animacy is part of that. Adjectives inherit the animacy from the noun too, so this is important information. So I think adding a distinction between m-an and m-in would be good. —CodeCat 14:36, 29 September 2013 (UTC)
But since dictionaries don't mark it, and since we have no editor who considers his competence in Lower Sorbian to be higher than level dsb-2 (and even he is very rarely around), how would we know? I haven't the remotest idea whether chess pieces are animate or inanimate in Lower Sorbian, and short of trawling through the online archives of Nowy Casnik in hopes of finding an article that mentions them (in the accusative singular!) or asking a question at w:dsb:Wikipedija:Portal (which has been edited a total of 4 times so far this year) I have no idea how to find out. Even for the declension tables I have to rely on my gut feeling about the semantics, and if I'm not sure I can just leave the declension table out. But headword lines can't be left out; I really don't want to be forced to commit myself to m-an or m-in when I with my meager knowledge of the language am the only editor actively working on it. —Aɴɢʀ (talk) 14:59, 29 September 2013 (UTC)
Then look at how {{sl-noun}} does it. It allows "m" alone, but treats it as incomplete. —CodeCat 16:23, 29 September 2013 (UTC)
Fair 'nuff, but it still feels unnecessary, probably because no other dictionary maker has felt the need to label animacy. But I guess it doesn't hurt anything either, so if you want to include it in the module, knock yourself out. Incidentally, are the Slovene names for chess pieces words that are unique to those meanings, or are they transferred from words that refer to humans anyway? Certainly in German, most names of pieces are masculine nouns that normally refer to humans (König, Läufer, Springer, Bauer; Dame is feminine and Turm refers to an inanimate object) so if that's the case in Slovene too it isn't actually surprising if they retain their grammatical animacy even when referring to an inanimate object. —Aɴɢʀ (talk) 16:45, 29 September 2013 (UTC)
In the absence of any objections, I have Luacized {{dsb-noun}} and deleted all of the subpages except {{dsb-noun/documentation}}. —Aɴɢʀ (talk) 20:48, 23 October 2013 (UTC)

October 2013


As you can see I have created the he-headword module, but have not yet directed any templates to it. So far it fully and backwards-compatibly supports all parts of speech except verbs.

I have created test templates at User:Wikitiki89/template:he-noun and User:Wikitiki89/template:he-adj, and test pages at User:Wikitiki89/בית, User:Wikitiki89/חם, and User:Wikitiki89/מצוין.

I think the module is ready for the real templates (except {{he-verb}}) to point to it, but I do not want to do this until someone reviews my code. I would appreciate it if you could do that.

--WikiTiki89 22:23, 21 October 2013 (UTC)

Backwards compatibility is good, but if the templates can be changed use Lua's features to their advantage, they should be. —CodeCat 22:43, 21 October 2013 (UTC)
I did, and the deprecated features can be removed later once they are no longer used. Backwards compatibility provides a smooth transition. --WikiTiki89 00:33, 22 October 2013 (UTC)
Cool! I'll (try to) do that this weekend. —Ruakh 05:14, 22 October 2013 (UTC)
It looks good to me. I've made some refactoring-type adjustments, I hope you don't mind. —Ruakh 06:16, 28 October 2013 (UTC)
Thanks! Everything you did looks good. The one thing is that you kind of re-emphasized the triad concept, which I was hoping could now be replaced with a diad concept, (i.e. have process_wv_triad process three values but only return two). That way we could deprecate the "wv" form and keep just the base form and "dwv". However, this would interfere with your idea of "when a nikudless form is given, it should presumably supersede the nikudish form in determining the link target". --WikiTiki89 14:07, 28 October 2013 (UTC)
Hmm. I don't know. I agree that it makes sense to move to a dyad concept for input, but I'll have to think about the details of how that should work. —Ruakh 15:07, 28 October 2013 (UTC)

Updating t templateEdit


It would be possible to know how you do to update with your bot the t template since Wiktionary:Votes/2013-09/Translation-links to other Wiktionaries? By parsing recent changes and verify all links in other Wiktionaries for each translation? I ask that because on the French Wikt, a bot do the update by analyzing linked pages of the t template, but here now t template can exist for every sort of translations: translation in language which haven't Wiktionary, translation without article in the associated Wiktionary… Sorry for my English and thank you by advance for your answer. — Automatik (talk) 11:48, 22 October 2013 (UTC)

I do not examine recent changes, no; rather, I just use the database dumps. Specifically, I use the "List of page titles in main namespace" dump (all-titles-in-ns0.gz). —Ruakh 14:30, 22 October 2013 (UTC)
Sorry, but I didn't understand everything. When you have the list of all page titles in main namespace, what do you do? Do you go on all pages and analyze all translations to know if they have an equivalent? Thank you. — Automatik (talk) 16:12, 23 October 2013 (UTC)
Sorry, I didn't express that well. For the English Wiktionary, I use the normal XML dump ("Articles, templates, media/file descriptions, and primary meta-pages", pages-articles.xml.bz2) to see what pages have translation-templates to be updated. I use the "List of page titles in main namespace" dump (all-titles-in-ns0.gz) to see what entries exist on other Wiktionaries. (For a small number of Wiktionaries, namely zh.wikt and kk.wikt and iu.wikt, I also use their respective APIs to confirm that an entry doesn't exist before I change {{t+}} to {{t}}.)
Désolé, je me suis mal exprimé. Pour Wiktionary (le Wiktionnaire anglais), je me sers du fichier XML ordinaire ("Articles, templates, media/file descriptions, and primary meta-pages", pages-articles.xml.bz2, qui représente les articles, les modèles, etc.) pour trouver les pages avec les modèles de traduction qui doivent se mettre à date. Je me sers du fichier "List of page titles in main namespace" (all-titles-in-ns0.gz, qui représente une simple liste de titres d'articles) pour savoir quels articles existent sur d'autres Wiktionnaires. (Pour un petit nombre de Wiktionnaires, à savoir zh.wikt et kk.wikt et iu.wikt, je me sers aussi de leurs API pour confirmer qu'un article n'existe pas avant de remplacer {{t+}} par {{t}}.)
Ruakh 16:39, 23 October 2013 (UTC)
Merci pour la réponse développée. Mais je dois dire que je ne comprends toujours pas quelque chose, comment savoir quelles pages existent dans les autres Wiktionnaire avec all-titles-in-ns0.gz, qui ne liste que les pages du Wiktionnaire en anglais (et pas des autres Wiktionnaires, si j'ai bien compris) ? Quoi qu'il en soit, merci. — Automatik (talk) 20:15, 31 October 2013 (UTC)
Chaque Wiktionnaire a son propre fichier all-titles-in-ns0.gz. J'ai écrit un script qui télécharge tous ces fichiers, et un autre script qui ramasse leur contenu à un seul fichier avec les lignes comme ceci :
reification	et	fr	ko	pl	ru	ta	vi	zh
Ruakh 21:46, 31 October 2013 (UTC)
D'accord, merci beaucoup. Pour changer les {{t}} en {{t+}}, je suppose que la méthode est la même en utilisant toutes les pages qui incluent {{t}}. — Automatik (talk) 12:05, 1 November 2013 (UTC)

User:Conrad.Irwin/editor.js and Module:links's remove_diacriticsEdit


I have replied on my talk page (also Wikitiki89). It seems to produce "alt=" when THERE IS NO diacritics, though. I have removed all "alt=" occurrences from navel#Translations. --Anatoli (обсудить/вклад) 02:50, 23 October 2013 (UTC)

Template:by extensionEdit

I believe your deletion of that template to be a grave error. There are two things you should have done and didn't. First off, you should have checked to see what links to it. There are over 50 pages that use that transclusion. Also, you didn't give any rationale. As such, I have filed an undeletion request. Purplebackpack89 (Notes Taken) (Locker) 09:55, 23 October 2013 (UTC)

It was deleted because it has been replaced with {{context|by extension}}. Most of the pages in the "What links here" don't actually transclude the template, but have switched over to the new version. Due to some caching issues which I don't completely understand (but maybe Ruakh does), they are still listed as transcluding it. --WikiTiki89 14:17, 23 October 2013 (UTC)
It's not a caching issue; it's just that when a given context-label isn't listed at Module:labels/data, Module:labels will check to see if the template exists. The software considers that check to be a transclusion, even though the template doesn't actually get transcluded. (This is because the major purpose of transclusion-tracking is so that when a template is edited, the pages that transcluded can be updated automatically. A page that checks a template's existence will also need to be updated when the template is created, so the software counts it as a transclusion.) —Ruakh 16:09, 23 October 2013 (UTC)

"If you disagree with this rollback, please leave a message on my talk-page"Edit

I really wish you wouldn't use rollback to remove talk-page messages. It makes it seem that they were placed in bad faith. Speaking of assuming bad faith, you essentially are assuming bad faith towards me, and I wish you wouldn't. I created Template:by extension because it made sense to me that people would look for (by extension) there first. I did it in good faith. I disagree with it being lumped into Template:context, but I disagree in good faith. Please assume good faith Purplebackpack89 (Notes Taken) (Locker) 20:21, 23 October 2013 (UTC)

I am not assuming bad faith on your part. (Wikipedia has a page that you might benefit from: w:Wikipedia:Assume the assumption of good faith.) As for the rollbacks, I think you're simply mistaken. Perhaps it has that implication at Wikipedia, but here it does not. I often click 'rollback' on my own edits, for various reasons; is this to be taken as an admission of bad faith? —Ruakh 20:33, 23 October 2013 (UTC)

impf/pf parametersEdit

Hi. Could you take a look at User_talk:Conrad.Irwin/editor.js#impf.2Fpf_parameters, please?

Changes to Russian entriesEdit


Could you make some changes to User:Ruakh/Tbot.js, please - for nouns, verbs, adjectives and adverbs?

The first parameter must be the terms itself (with the stress mark if exists) (no "head=") The second parameter for nouns is the gender without "g=", for verbs it's impf/pf if it's set.

As an example, a translation of undocking#Translations should produce {{ru-noun|расстыко́вка|f}}. --Anatoli (обсудить/вклад) 11:09, 27 October 2013 (UTC)

  DoneRuakh 17:12, 27 October 2013 (UTC)


I don't know whether this filter was written as the result of a discussion or if it was just your own idea. If the former, I'd appreciate being pointed to the discussion, and I'll take my issue there instead of here (or I'll read stuff there that will make this discussion moot). Otherwise, you can keep reading.

Many of the edits that the filter catches are people's attempts at adding bluelinks to lists in order to feed bots. Are those the intended target of the filter? If so (and it's not necessarily unreasonable), then fine; but, if not, then you may wish to reconsider the filter: perhaps rate-limit the edits rather than barring them, or perhaps include exemptions for various known bot-feeding pages.

​—msh210 (talk) 21:01, 30 October 2013 (UTC)

I created this filter because some spambots used to create user pages of other users to spread their links. You can still see them at the beginning of the abuse log. -- Liliana 21:13, 30 October 2013 (UTC)
Sorry for bothering you, Ran: I thought you'd written it.​—msh210 (talk) 21:29, 30 October 2013 (UTC)
No worries. The main page for the filter just shows who touched it last; to see who originally created it, you have to look at the history (in this case Special:AbuseFilter/history/24). —Ruakh 00:50, 31 October 2013 (UTC)
@msh210, See WT:Grease pit/2013/October#Can't update another User's subpage..--WikiTiki89 21:25, 30 October 2013 (UTC)
Thanks for the link.​—msh210 (talk) 21:29, 30 October 2013 (UTC)

November 2013Edit

Etymology of fox in ScandinavianEdit

Why did you rollback all my edits on the etymology of the word for fox in all Scandinavian languages? It was well sourced, as opposed to the etymological entries it replaced. Please undo this. 15:13, 2 November 2013 (UTC)

A few reasons:
  1. You removed the etymologies of the actual words that the entries were about. For example, [[refur#Icelandic]] originally stated that the word was from Old Norse *refr; you removed that statement, replacing it with an explanation of where "the Nordic word" came from. I'm not sure if your motivation here was political — an insistence on viewing Old Norse and Icelandic as both being forms of "Nordic", rather than the latter coming from the former? — or if you just didn't find it interesting enough to include, or what, but regardless, it was bizarre and clearly wrong.
  2. Lots of formatting and technical details were terrible. For example, in your attempt to list all Scandinavian cognates, you added [[refur]] to (for example) Category:English terms derived from Finnish.
  3. Your central claim, that the word jumped from Iranian to Norse less than two thousand years ago, and to pre-Spanish at some unspecified time, jumping over the huge geographic regions separating these languages, seems patently absurd; at the very least, it requires some sort of explanation of how that is even remotely possible.
  4. One of your sources is from 1928, which is automatically suspicious; historical linguistics has come a long way in the past 85 years, and this reference is not worth listing. You might as well list your brother Bob.
  5. Your other source is a dictionary entry for a word that you claim is not relevant. The entry originally said that the Old Norse word came from Proto-Germanic, but you removed that claim and replaced it with a claim that it jumped directly over Europe. Your reference for this claim can hardly be a dictionary entry for a Proto-Germanic word that you've removed all references to.
I absolutely will not undo my rollbacks, but if your theory is less absurd than it looks on its face — or if the problem is just that you described the theory poorly — I will happily help you add it in an acceptable way.
Ruakh 15:58, 2 November 2013 (UTC)
Dear Ruakh, there is no need for insulting words. I wish you had checked out the references I gave, instead of criticizing the etymology because you deem it an unlikely one. The main source I quote is Kroonen's dictionary, who addresses the etymology of this word under the lemma I quoted. Please, check it out. It is a very reliable and up-to-date work on (Proto-)Germanic etymology. I included the reference to the work from 1928, because Kroonen references it. It is not unusual to have to rely on century old works in historical linguistics (in particular several German descriptive grammars from the 19th century on e.g. Sanskrit are still unsurpassed nowadays), so to say it is 'automatically suspicious' is in my opinion untrue.
Since this is my first contribution to Wiktionary ever, I had some trouble figuring out tags (and if I made a mistake with the Old Norse versus Nordic thing, you could have just adjusted that, instead of reverting the entire edit). I have to say seeing an entry that I put time and effort in reverted just like that (even though the former entry was completely unsourced and I could not find any proof of the existence of a Proto-Germanic word *rebaz in a reliable source) is certainly discouraging me from contributing again.
I agree that the leap from Iranian to Scandinavian is a large one (and the Spanish word may even be unrelated for all I know), but this is what the source says. It is not my job to defend this, as I am not an expert in Proto-Germanic, whereas Kroonen is. And I repeat: the previous entry was completely unsourced. In my opinion, a sourced entry is better than an unsourced one (especially considering that the source in this case is a well-esteemed one). Besides, it would not be the only word to end up in Germanic from an Iranic source (cf. the word 'path', coming from Alanic/Ossetic 'paða').
I do not understand your 5th point: a dictionary entry for a word that you claim is not relevant. What word did I claim not to be relevant? The lemma in the Proto-Germanic dictionary I am referring to is *fuhsa-, where the etymology of the 'Nordic' word for fox is addressed, you would have seen this if you had checked it. The same dictionary did not contain a lemma for a word *rebaz (or something similar). Seeing as the only descendants of this purported Proto-Germanic word *rebaz are in Scandinavian languages, one can clearly not reconstruct it back to Proto-Germanic without evidence of the existence in at least one language that is not North-Germanic, so this explains the absence of it in Kroonen's dictionary.
I hope this will convince you to undo your revert. If you think the info is not up to standard, incomplete, or has some wrong tags in it, you are free to improve it or mention other theories that you can find sources for. But to replace this referenced etymology by an unreferenced one that has no claim for even being there, is in my opinion to regress the article to a lesser state of correctness. I thought continuous improvement in small cumulative steps was the point of an open dictionary, so I hope my contribution will not be ignored, but instead be accepted and left there to be improved upon by whoever can do so. 20:40, 2 November 2013 (UTC)
Re: "I had some trouble figuring out tags": That's totally understandable, and it alone would not be a reason to revert. It left a lot of cleanup work for other editors to do, but if the edit had otherwise seemed correct, I would have done the cleanup work myself. (I also would have fixed the entries where you completely removed the etymology, with only a pointer to the Icelandic entry.)
Re: path: It's true that it "end[ed] up in Germanic from an Iranic source", but you're leaving out the important detail that it entered Proto-Germanic from Scythian, which is geographically and chronologically plausible. But it does not make sense for a word to have jumped directly from an Iranian language to Old Norse.
Re: "I do not understand your 5th point": Oops, you're right, my mistake!
Kroonen's dictionary costs way too much for me to consider buying it, especially given the extreme negativity of the sole Amazon review (even despite the reply that disagrees in many particulars), and WorldCat says the closest library to me that has the book is more than 2,000 miles away (more than 3,000 km). Would you be willing to e-mail me the text of the relevant entry?
Ruakh 03:56, 3 November 2013 (UTC)

Language-specific CSS at MediaWiki:Common.cssEdit

I asked a question at Wiktionary:Grease_pit/2013/November#Language-specific CSS at MediaWiki:Common.css that you might be able to answer. Please take a look. Thanks. --WikiTiki89 03:06, 7 November 2013 (UTC)


I was wondering why this was deleted? We figured ski + opolis meant ski city or as we refer to it as the resort is full and busy —This unsigned comment was added by (talk) at 05:38, 8 November 2013‎ (UTC).

The word skiopolis does not seem to exist; I couldn't find any evidence at all that anyone uses it with the meaning given in the entry. We only include terms for which "it's likely that someone would run across it and want to know what it means", which requires a certain level of actual usage that someone might run across. See Wiktionary:Criteria for inclusion, especially the "Attestation" section. —Ruakh 19:21, 8 November 2013 (UTC)


I got your message. I reverted myself because I had only instated the anon's edit in the first place so I could cite it in this not-really-related discussion as an example of an edit that was blocked by our "stop very new users from adding links" filter. (Side note: perhaps this highlights a need to tweak that filter so that it only blocks edits that add links? The anon's edit didn't add a link, but was blocked because it retained one.)
I figured a discussion of üzeri itself was the appropriate way of deciding what to do with that entry.
If you think reinstating the anon's version is the easiest way of cleaning up the entry, go right ahead. I suppose the ideal thing would be to find which references posit the Altaic connection and cite those along with some language about how controversial the Altaic theory is, but I'm not sure I know where to look for such references, so removal may be the best practical option. - -sche (discuss) 05:34, 10 November 2013 (UTC)

Re: "The anon's edit didn't add a link, but was blocked because it retained one": If you look at the timestamps, you'll see that the blocked edit actually came a few minutes before the page was created. The anon tried to create the page, was blocked, then logged in and successfully created it. (So you actually could have just linked to the original page version to give your example, rather than trying to install the anon's version.) (Obviously I didn't realize this, either, at first.)
But we still do have a problem, because the link that was added was not put directly in the wikitext, but rather added via {{R:tr:Nishanyan}}. That sort of link should be exempted.
The nice thing about the built-in added_links condition is that it seems to be smart enough to tell that a given link was genuinely added in a given edit (whereas merely examining added_lines, as we would do if had to write our own condition, would run afoul of cases where a user tried to edit a line already containing a link), so I don't want to dispense with it completely, but we might add an additional check that added_lines actually contains either http or [// in the wikitext itself.
Ruakh 05:53, 10 November 2013 (UTC)

Technical advice neededEdit

In my efforts to make a useful set of entries for taxonomic names, I am focusing on those that correspond to normal-language entries, ie, not Translingual. Inserting {{taxlink}} is a somewhat labor-intensive way of achieving that. I have already done searchbox searches for the words species and genus and added {{taxlink}} wherever it was appropriate. I have tried looking at categories like Category:en:Fruits etc.

Whenever a new taxon is added, I add taxlink wherever appropriate to any redlinked taxonomic name contained in entries that contain the taxon and plain links to at least the first use in an L2. In doing this, I notice that there are many entries which offer no obvious clue that they contain a taxonomic name. That means that I would have to do a more all-encompassing search to find them. I can't really search for "all" taxonomic names or rather it is silly to try as there are millions, most of which are of interest only to specialists to whom Wiktionary is not a plausible resource. So, I would like to exclude all subgeneric names and all specific epithets from my searches. That means all the words I am looking for should be capitalized in Wiktionary and effectively all are single words. I can get a usable list of all entry titles in Wikispecies, which could be reduced to the one-word titles. The intersection of WS one-word titles and capitalized words appearing in definitions worded in English in Wiktionary (excluding sentence-initial capitals and those from abbreviation entries if that is feasible and speeds things up), removing duplicates, would be a valuable list for subsequent use:

  1. all redlinks are candidates for addition;
  2. all blue links should be linkified at least once in each L2 in which they appear in a definition or list.

Some of this I can do or can learn to do with dump processing using Perl or Python (recommendations about language?)

Linkification would seem botworthy. I know nothing about bots. What course of action do you recommend?

What is an efficient way to produce the list of capitalized words not sentence-initial? Is this likely to take hours, days? (I wish I had installed 64-bit Windows so I could use more than 4 Gb of RAM.)

Am I missing some alternative? Am I thinking about this wrong?

Please answer at your convenience. If you think I should try someone else, please make a recommendation. User:Pengo comes to mind. DCDuring TALK 18:52, 14 November 2013 (UTC)

I think the best approach is generally to try things, and see if they give you what you want. It's hard to predict what you'll find when you dive through the dumps; our formatting is really all over the place.
Even with the RAM you do have, you can probably hold a Perl hash or Python dictionary of all the Wikispecies page-titles you're interested in, so you can just read those all in from a file, and then examine the database dump to look for matches. I don't know if it's really all that necessary to filter out sentence-initial capitals: are there very many genus-and-higher names that are also normal words likely to occur sentence-initially?
I love and use Perl, but language preference can be a very personal thing. Probably the biggest objective factor is, who you expect/hope to get help from. If you want help from me, you probably won't want to use Python, though Perl isn't your only option. (I mean, for one thing, I'm a professional Java developer. Perl is the terse language I turn to when nobody's paying me to be verbose. :-)   ) But overall, many more Wikimedians seem to use Python than Perl, or indeed than any other language, and if you want help from (say) DTHLS or CodeCat, you're certainly better off with Python than with Perl.
I'll also add that, while you may not find this very satisfying, it's probably easier for me to give you working code, which you can then tweak as desired, than to walk you through writing a program from scratch.
Ruakh 07:24, 19 November 2013 (UTC)
Thanks for the response. You must be fairly busy with your paying job.
I'd like to actually learn either Perl or Python sufficiently at least for making lists. I was leaning toward Perl. But can either language be as effectively use for the running of a bot?
I definitely need to experiment. I was just concerned to try to avoid blind alleys. I'm not optimistic about getting help from CodeCat. Pengo has interests in taxonomic names and has produced lists so I could bother him about some things that are specific to the data.
In terms of general strategy, the idea would be to have one monster hash at a time. If I need to work with more than one to accomplish some purpose, it should be done in such a way as not to try to have two monsters at the same time. I should "undef" one after I've done as much as possible with it, before defining the other. The reason I ask is that I expect that I will be using the principal namespace pagetitles from wikispecies, wiktionary, and wikipedia, as well as the list of all capitalized (mostly) English words from Wiktionary. These all seem like monsters. If any of these get too big I can break them up by groupings based on first letter.
I should also take a good look at the XML for typical target pages (eg, Wikispecies pages with vernacular name templates) for each wiki whose content I would use, to help with my regular expressions. DCDuring TALK 17:07, 19 November 2013 (UTC)
I definitely think you should learn Perl (or Python or whatnot). When I said I would send you working code to tweak, "tweak" was probably the wrong word: for what you want to do, you would want to really understand the code and be able to make significant changes to it. I just meant that there was no reason to start from scratch.
All of my bots are in Perl (and I'd be happy to send you source-code to use and modify); most other bot owners use Python (with the open-source pywikipediabot framework).
I agree that you probably only want one monster hash at a time, if only because a hash is a one-way lookup, so the only way to use hashes to compare large sets to each other is to put one set in a list and the other in a hash, and iterate over the former while doing lookups in the latter. For more general comparisons (e.g., n-way comparisons), you'd probably be better off with sorted lists or something.
Ruakh 22:09, 19 November 2013 (UTC)
Not starting from scratch is good. After all, I am not going to be a professional programmer peddling my mastery of Perl or Python to all and sundry. Initially my objectives are quite limited indeed, being focused solely on Translingual. It would clearly save alot of time if I could be start from code that works for Wiktionary. An example of two-level sorting would be very useful, though I think I know how to do it. Also something that used large, not monster, hashes of short lists (for work with many/all translingual entries, tracking deficiencies of the entry and of the definitions. Any simple Perl-based bot would be nice so I understood how they worked in Perl and perhaps could compare with the pywikipediabot framework. If you think I should look at some other things, especially before these items, please send those too.
Thanks for the help. I hope that I can get going on this. Real life intrudes over the holidays. DCDuring TALK 23:19, 19 November 2013 (UTC)
So, sorry, is there any action item (if you'll pardon the business speak) out of this? Am I supposed to be sending you a script that does something? If so, what is it? —Ruakh 18:51, 28 November 2013 (UTC)

Template:support, etcEdit

Actually, the redundant "#" does result in there being two "#"s, if people subst the template (which they routinely do, because that was its behaviour for a long time). You can see this in my test here. You can also see it four times in the history of the Proto-Altaic vote, which is where I noticed it. You're right that having an explicit "#" in the wikitext is clearer, though... so let's remove the hash from Template:support, etc. - -sche (discuss) 18:05, 28 November 2013 (UTC)

If you've only seen it four times in the history of the Proto-Altaic vote, then the vast majority of voters have been writing {{subst:support}} without an explicit additional #. (Which makes sense. We've always recommended just {{subst:support}}, and that's what the wikitext comments on that page say to write.) The fact that # {{subst:support}} results in two hashes is not new, and has never been a problem — sometimes we feel emotionally compelled to clean up the extra #'s, but they don't have any real effect — whereas if bare {{subst:support}} stops working properly, I think that really will be a problem. (I mean, it's not the end of the world. People will quickly learn, just as they've learned to add ~~~~ now that the template doesn't provide it even when substituted. But it's still more harm than good.) —Ruakh 18:49, 28 November 2013 (UTC)

December 2013Edit

Categories for RhymesEdit

I think Wiktionary:Grease pit/2013/December#Rhymes categories again could benefit from your input. —CodeCat 18:07, 7 December 2013 (UTC)


I don't mind removing the script error, but I do object to removing the ability to track and find errors. If you don't like the error, can you at least replace it with a tracking category? —CodeCat 22:11, 8 December 2013 (UTC)

Well, but they go in Category:Language code missing/usex now. Is that not satisfactory? —Ruakh 22:34, 8 December 2013 (UTC)
I suppose it is, but I don't know if those two things are at the same level of wrongness. —CodeCat 00:16, 9 December 2013 (UTC)
I agree that a wrong language-code is much more wrong than a missing one; but in the case of {{usex}}, both are about equally easily fixed, and the category has only a few entries anyway. So if there had already been a category-tree for incorrect language-codes alongside the one for missing ones, I'd have used it, but there wasn't, so I didn't want to create one just for this. (By all means, though, you should feel free to do so.) —Ruakh 00:23, 9 December 2013 (UTC)

Hebrew transliterationEdit


I tried to understand from this discussion your position on Hebrew transliteration but I couldn't, since I don't know enough Hebrew and examples used were not clear. AFAIK, Hebrew often can't be transliterated automatically or literally (letter-to-letter) with acceptable results, even with vowel points. So, what's the deal with the transliteration? Are there any exceptions (letters are pronounced differently, non-standard spellings (with no affect on pronunciations)? BTW, I'm not canvassing, I really want to know. --Anatoli (обсудить/вклад) 03:45, 9 December 2013 (UTC)

I don't have a firm position; I'm open to a lot of possibilities. Over time, the Hebrew editors have coalesced on using a transliteration based on the Ashkenazi variant of Modern Israeli Hebrew pronunciation, even for Biblical quotations. There are a number of reasons for this, but basically the overarching reason is that all approaches have problems, and the problems with our current approach seem to bother us less than problems with alternative approaches.
I would not consider any literal/automatic transliteration to be acceptable, for the same reason you feel that г should be transliterated 'v' in some cases. There are strangenesses that we accept with Hebrew spelling, but that are magnified/exacerbated when you try to transliterate them into an orthography that doesn't share those strangenesses. For example, the Masoretes apparently assigned a single vowel marker, called kamáts — it's the 'T'-like character under the letter in אָ — to two distinct sounds. At that time, the two sounds apparently differed mainly or exclusively in vowel quantity (length) rather than quality — and in fact, in many present-day dialects they still sound more or less the same — but in mainstream Modern Hebrew they are now quite different, the long vowel having merged with /a/ and the short one with /o/. In fact, Modern orthography will sometimes write the latter with a consonant letter called vav, otherwise used to write /w/, /v/, /u/, and /o/. If we take a literal/automatic transliteration approach, how do we assign a symbol to that?
Ruakh 04:10, 9 December 2013 (UTC)
Thank you. I knew there was something similar to Arabic و and ي in Hebrew and ى, which is often used instead of ي, when using a literal/automatic transliteration approach would be wrong to transliterate as ā, instead of ī or (a)y. --Anatoli (обсудить/вклад) 04:16, 9 December 2013 (UTC)
If I understand correctly, kamats is one of the main problems with automatic transliteration of Hebrew. It seems that it needs manual intervention of native speakers. --Anatoli (обсудить/вклад) 04:25, 9 December 2013 (UTC)
Not native speakers so much as scholars of Hebrew. Ordinary native speakers generally know very little about the details of vowel system and frequently mispronounce kamatz katan (the one that's supposed to be /o/) as /a/, when the word is not written with a vav. --WikiTiki89 04:45, 9 December 2013 (UTC)

Something strange happening with Module:families transclusionsEdit

This module is orphaned, and shouldn't really have any more transclusions. But for some reason, more new transclusions keep appearing, even though they disappear again when I do a null edit on those pages. Do you know what might be happening? —CodeCat 23:48, 12 December 2013 (UTC)

Nope. It suggests that pages are being partially updated as part of job-queue-ness, but I don't know enough about how that works. When you find out, please let me know. :-)   —Ruakh 00:51, 13 December 2013 (UTC)

A "language code sanity check" module?Edit

Right now we have a lot of things concerning language codes that are conventions, but they're not really enforced by anything. For example, nothing enforces that every language has a script and a family, nothing enforces that things in Module:etymology language/data don't clash with any other codes, and nothing checks to see whether all the codes in Module:languages/data3/a really are all 3-letter codes beginning with "a". Do you think it would be useful to write a single-purpose module for this, and transclude it on one single page? The module would trigger a script error or add that one page to a category if anything is not right, which would then alert people to the problem. —CodeCat 21:49, 14 December 2013 (UTC)

Makes sense to me. (We need more of these sorts of tests.) —Ruakh 02:32, 15 December 2013 (UTC)
I just wrote a module like that: Module:data consistency check. Test it here. Feel free to take it from there. Keφr 17:42, 17 December 2013 (UTC)


Why? --Z 13:47, 30 December 2013 (UTC)

Because we don't need to detect the script if we already know it. —Ruakh 20:48, 30 December 2013 (UTC)
That's not the way it currently works. If there is only one script, it still tries to detect, which I think is stupid. --WikiTiki89 21:07, 30 December 2013 (UTC)
We did it that way because we didn't originally know how reliable script detection was. That's why Category:Terms using script detection fallback exists; it was meant to allow us to track down cases where detection failed, so that we could fix it. —CodeCat 21:29, 30 December 2013 (UTC)
I don't see how that's relevant. If a language has only one script, that script should always be used (unless another one is specified with sc=). If a language has more than one, then the script detection should attempt to detect which of the allowed scripts it is, but if it fails, the fallback should be the language's default script and not some script that is not allowed for a language. --WikiTiki89 22:56, 30 December 2013 (UTC)
The concept of a "default" script only applied when we used templates. We don't really use it in Lua anymore, all the listed scripts are equal. In any case I think you're missing the point. The reason why script detection was done even when there was only one script, was to detect when it fails, so that we can improve the detection itself. And there were several thousand pages in the category before Ruakh reverted it; all of those pages either had a problem with the entry, or a problem with the detection. Either way, it was a useful to-do list. —CodeCat 00:01, 31 December 2013 (UTC)
It's fine to see when it fails, but that doesn't mean you can't still use the right script rather than the detected one. --WikiTiki89 00:08, 31 December 2013 (UTC) how is it supposed to know what the right script is, if it fails to figure it out? —CodeCat 00:10, 31 December 2013 (UTC)
By choosing the first script on the language's list. There has to be some kind of default anyway, otherwise what would you do in cases where the text is just general punctuation shared by most scripts? --WikiTiki89 00:19, 31 December 2013 (UTC)
So you're saying the first script of the language is a better default than the best match out of all scripts. That can work sometimes, but it doesn't work always. Some very arcane character might not even be supported by the first listed script (its font, specifically) and it actually needs extra support. And that's the kind of thing we're trying to detect with this: to see at what points the detection isn't able to figure out what to use on its own. In the past, we would add sc=, but I think we can do better than that now. Ideally, script detection should be able to figure out anything at all, and I think that's doable as long as we have one script that matches "anything not in the others". —CodeCat 00:26, 31 December 2013 (UTC)
That's exactly what I'm saying. In the rare case that a script that is not normally used for a language is needed, you should use a sc= tag. Script detection should only be used to tell which of several allowed scripts is currently being used. Trying to find a match from the set of all known scripts is a bad idea because there are too many overlaps. --WikiTiki89 00:37, 31 December 2013 (UTC)
The purpose of my edit was not detecting script when we already know it, but to find possible mistakes in entries (writing the word in wrong script, or providing wrong language code) and in Module:scripts/data (such as that of polytonic). --Z 10:02, 31 December 2013 (UTC)
We can do that without having to try all possible scripts. (In the case of polytonic that you mention, I never looked at the generated HTML, but I assume that we were mistagging the text with Grek.) —Ruakh 02:20, 1 January 2014 (UTC)

Hebrew accelerated creationEdit

I've been updating the accelerated creation script, and I've fixed some problems with it for Hebrew. Could you check to see if it's all ok? Here is an entry with green links: מוהל‎. —CodeCat 15:47, 31 December 2013 (UTC)

Thanks. The pre-fill-in from the green-links look good to me. —Ruakh 22:49, 31 December 2013 (UTC)
I haven't been able to find a way to add the head= form yet. The form is provided by the link in the template (it looks at the content of the link element), and that part works. But I'm not sure if there is a straightforward way to retrieve the page name that the link points to, because it's not a regular wiki link but a plain internet link. This would be needed to tell whether head= is needed, because it shouldn't be added if the display form of the link is the same as the entry name. —CodeCat 00:12, 1 January 2014 (UTC)
You could have the template itself provide that information through extra CSS classes. --WikiTiki89 02:01, 1 January 2014 (UTC)
Return to the user page of "Ruakh/2013".