User talk:Ruakh
ISBN
What does the ISBN code mean in this edit? Pass a Method (talk) 20:48, 7 January 2013 (UTC)
- See w:ISBN. —Μετάknowledgediscuss/deeds 21:11, 7 January 2013 (UTC)
yákhas ót l'rá'ash
Are you sure about the transliteration? Laráash (לָרַעַשׁ) sounds better to my ears than l'ráash, but I may be overly influenced by my greater familiarity with Biblical Hebrew than with modern. (Cf. Gesenius, 102h.)—msh210℠ (talk) 06:57, 9 January 2013 (UTC)
Template:list:Hebrew script letters/he
Hi, Ruakh. I'm trying to convert templates to the new format used in User:CodeCat/list helper which is less resource-intensive. But with this template I am having some problems, because my browser is acting strange with the Hebrew characters. When I press enter it changes the order of the characters, and I don't know the Hebrew alphabet so I am afraid to mess up and put the letters in the wrong order by accident. You probably have more experience dealing with such things so could you try? Template:list:Latin script letters/en has an example you can work from, but Hebrew doesn't use letter casing so there would probably be only one letter per line. —CodeCat 02:20, 11 January 2013 (UTC)
- I gave it a whirl. I'm not really sure how it should look; I kept the letters in the left-to-right order they already had (which is kind of backward-looking, since Hebrew is read right-to-left, but not a huge deal), except that since the Latin uppercase letters were unseparated from their lowercase counterparts, I did the same thing for Hebrew medial and final forms, so for those I had to put them in right-to-left order, because ךכ looks like an April Fool's prank. Feel free to reverse the overall order, or make any other tweaks, or anything. —Ruakh 02:40, 11 January 2013 (UTC)
- Thank you! I will trust your judgement when it comes to Hebrew because I know nothing about it so I can't judge how it should look or what looks good or bad. If you think the order should be reversed that is ok, but keep in mind that English users will expect the order of list elements to be ordered left to right, even if the individual items are to be read right to left. Template:list:days of the week/yi was made that way too. —CodeCat 02:46, 11 January 2013 (UTC)
- Personally, I think it looks weird and would prefer right-to-left. days of the week/yi is acceptable to me, but that's because English-speaking users might be going to those pages for their semantic value, in which case left-to-right order is most logical. This is going to be chiefly used for the letters' value in writing, not semantic meanings, and users who are looking at the pages are more likely to know that Hebrew is right-to-left. It would then reflect the universal presentation of the script letters, rather than our idiosyncratic mixture of directions, as if someone was taking xkcd seriously. —Μετάknowledgediscuss/deeds 03:06, 11 January 2013 (UTC)
- Note that the Arabic alphabet lists (Template:list:Arabic script letters/ar, Template:list:Arabic script letters/fa, etc.) are currently right-to-left. --WikiTiki89 19:23, 11 January 2013 (UTC)
- FWIW, I would support putting the Hebrew letters of the alphabet in, erm, alphabetical order, right-to-left. I find it rather surreal that they're listed left-to-right. - -sche (discuss) 04:41, 13 January 2013 (UTC)
Done —Μετάknowledgediscuss/deeds 05:35, 13 January 2013 (UTC)
- I've reverted and re-done it a different way, I hope you don't mind. (Putting the letters in reverse order, while forcing that order to be presented LTR, seems rather hackish to me. Logically, it makes more sense to put the letters in the correct order, presenting it RTL. Please revert if there was a reason for the other approach.) —Ruakh 06:23, 13 January 2013 (UTC)
- I think HTML already treats Hebrew text as RTL by default, so the RTL markers probably aren't necessary. —CodeCat 13:05, 13 January 2013 (UTC)
- You are correct, but I included them because I think that maybe our templates should include LTR markers around Hebrew-script text, so I'm operating under the premise that maybe we'll do that someday. If so, then in the rare cases that we really want the context containing Hebrew-script text to be RTL, we would have to explicitly include RTL markers; and in the meantime, they're harmless; so it seemed like a sort of future-proofing. —Ruakh 16:03, 13 January 2013 (UTC)
- Hackish? Yeah, kinda. I honestly just did it because it was easier that way (somewhat embarrassingly, while singing the call-and-response אלפבית song...) —Μετάknowledgediscuss/deeds 16:50, 13 January 2013 (UTC)
- You are correct, but I included them because I think that maybe our templates should include LTR markers around Hebrew-script text, so I'm operating under the premise that maybe we'll do that someday. If so, then in the rare cases that we really want the context containing Hebrew-script text to be RTL, we would have to explicitly include RTL markers; and in the meantime, they're harmless; so it seemed like a sort of future-proofing. —Ruakh 16:03, 13 January 2013 (UTC)
- I think HTML already treats Hebrew text as RTL by default, so the RTL markers probably aren't necessary. —CodeCat 13:05, 13 January 2013 (UTC)
- I've reverted and re-done it a different way, I hope you don't mind. (Putting the letters in reverse order, while forcing that order to be presented LTR, seems rather hackish to me. Logically, it makes more sense to put the letters in the correct order, presenting it RTL. Please revert if there was a reason for the other approach.) —Ruakh 06:23, 13 January 2013 (UTC)
- Personally, I think it looks weird and would prefer right-to-left. days of the week/yi is acceptable to me, but that's because English-speaking users might be going to those pages for their semantic value, in which case left-to-right order is most logical. This is going to be chiefly used for the letters' value in writing, not semantic meanings, and users who are looking at the pages are more likely to know that Hebrew is right-to-left. It would then reflect the universal presentation of the script letters, rather than our idiosyncratic mixture of directions, as if someone was taking xkcd seriously. —Μετάknowledgediscuss/deeds 03:06, 11 January 2013 (UTC)
- Thank you! I will trust your judgement when it comes to Hebrew because I know nothing about it so I can't judge how it should look or what looks good or bad. If you think the order should be reversed that is ok, but keep in mind that English users will expect the order of list elements to be ordered left to right, even if the individual items are to be read right to left. Template:list:days of the week/yi was made that way too. —CodeCat 02:46, 11 January 2013 (UTC)
接管
I don't speak IPA, but something tells me 接管 is not pronounced /ʨk/. ---> Tooironic (talk) 04:23, 13 January 2013 (UTC)
- I agree. —Ruakh 04:28, 13 January 2013 (UTC)
-
Done —Μετάknowledgediscuss/deeds 05:26, 13 January 2013 (UTC)
-
-
- Well, you fixed that one entry — and thank you :-) — but that doesn't really solve the overall problem . . . —Ruakh 06:17, 13 January 2013 (UTC)
-
-
-
-
- Tooironic only complained about one entry. As you know better than I, you can certainly scan a database dump for Mandarin IPA significantly shorter than the pīnyīn values, which should bring up all examples of this bug, and I'm sure that between me and the Mandarin regulars we can fix them all by hand. —Μετάknowledgediscuss/deeds 16:54, 13 January 2013 (UTC)
-
-
-
-
-
-
- Wait, really? I guess it never occurred to me that y'all would be willing to do that. The list of all 183 problematic entries is at User:Metaknowledge/py-to-ipa-problems. Thank you! :-D —Ruakh 17:07, 13 January 2013 (UTC)
-
-
-
-
-
-
-
-
- Сумпор! Wait, wrong language. Uh, um, 了不起!(or something like that, not sure if I got it right...) While we're at it, can you explain why this bug even happened? —Μετάknowledgediscuss/deeds 17:11, 13 January 2013 (UTC)
-
-
-
-
-
-
-
-
-
-
- I really don't know. Either the template was broken to begin with and no one even noticed, or something broke in the template during the process of making it substitutable. (The latter case has two subcases: the breakage could have been substitution-specific — like, say, maybe an #if: was made safesubstitutable, but its condition contained a nonsafesubstituted template, in which case the order of evaluation would have been such that the #if: would misbehave — or the breakage could have been general, like, some part of the template accidentally got deleted during that process. In either of those subcases, it's most likely, but not certainly, my fault.) In either case, the problem wasn't noticed until well after it had been substituted everywhere. I tried to go back later and figure out what had happened, but I couldn't: the template was just too messy and indecipherable (and it didn't seem like a high priority, since figuring out the problem would not really help in reversing the problem). —Ruakh 17:18, 13 January 2013 (UTC)
-
-
-
-
-
- Yes, I just realised what a massive problem this bot has caused now. Who knows how many IPA transcriptions it has stuffed up. 厭煩, 冷靜, 堅固, 搭配, 割傷, 荒誕, 學術, 學問, 記住, 評價, the list goes on and on. Is anyone going to address this? Or is a mass "reversal" on this bot's changes necessary? ---> Tooironic (talk) 03:17, 14 January 2013 (UTC)
-
- Re: "Who knows how many IPA transcriptions it has stuffed up": I don't think anyone knows for sure, but it's presumably either zero (if they were already messed up) or 183 (if they weren't). Re: "Is anyone going to address this?": Well, mostly I'd been quietly ignoring the issue out of a sense of frustration over the whole thing. (The substitution problems were just the last straw in the whole mess of dealing with this template.) But Μετάknowledge (talk • contribs) has now offered his and your assistance in fixing all of them. :-P If that doesn't work out . . . I'm not capable of fixing these broken pronunciations, but I can certainly go through and remove them, if people want. —Ruakh 03:33, 14 January 2013 (UTC)
-
-
- If you are not capable of fixing the mess afterwards, then don't mess with it in the first place. Substituting the unchanged template with related string templates missing will of course generate erroneous pronunciations. To fix all this, restore the related string templates (even if temporarily), replace all {{IPA|...|lang=cmn}} with {{subst:py-to-ipa|... (parameters from {{cmn-...|pin=***}}) }}. 129.78.32.22 03:43, 14 January 2013 (UTC)
-
-
-
-
- The related string templates weren't missing at the time. But, uh, nice try. Better luck next time? —Ruakh 03:49, 14 January 2013 (UTC)
-
-
-
-
-
-
- I don't know how many related templates you misdeleted whilst doing these substitutions and I don't care. Your mess anyway. 129.78.32.22 03:56, 14 January 2013 (UTC)
-
-
-
-
-
-
-
-
- I didn't delete any of them. So: zero. Oh, but that's right, you just said that you don't care. So you're just trolling. Which is convenient for me, because I'm sick of replying to you, and now I don't have to: trolling is grounds for blocking, so the next time you comment here, I can just revert & block. Problem solved. :-) —Ruakh 04:02, 14 January 2013 (UTC)
-
-
-
-
-
-
- @Tooironic: I really intend to do it, just not today. Wanna help? —Μετάknowledgediscuss/deeds 04:04, 14 January 2013 (UTC)
-
-
-
-
- Sorry to say this but before this is done en masse, the IPA on 接管 is incorrect, cf. the zh.wikt page. Tone sandhi has not been taken into account because whoever was generating this pronunciation was relying on User:Wjcd/py-ipa, a tool that only generates IPA for monosyllabic pinyin. 129.78.32.22 04:47, 14 January 2013 (UTC)
-
-
-
-
-
-
- Er, I can't understand the numerical notation they use... the only tone sandhi rules I know are bordering 3rds make the first one(s) 2nd(s), 3rds followed by other tones generally don't come back up, and 一 and 不 are exceptions. What am I missing? —Μετάknowledgediscuss/deeds 04:58, 14 January 2013 (UTC)
-
-
-
-
-
-
-
-
- The actual picture if it is to be represented by IPA is more complex than those two rules. There are basically six rules, and these rules make the third tone non-existent in compounds. The four tones in Beijing Mandarin are value-wise 55, 35, 214, 51 (Superscript numbers 1-5 are equivalent to IPA tone letters ˩˨˧˦˥). When they combine,
1) 55/35/51 + 214 = 211 + 214;
2) 214 + 214 = 35 + 214;
3) 214 + ø = 21(4);
4) non-ø + 211/214 + non-ø = non-ø + 1 + non-ø;
5) 51 + 51 = 53 + 51;
6) tone sandhi of 一 and 不.
Apply these rules repeatedly, until a stable tonal profile is obtained, where no sandhi rule from above can be applied any more. This gives the final IPA pronunciation. 129.78.32.22 05:13, 14 January 2013 (UTC)
- The actual picture if it is to be represented by IPA is more complex than those two rules. There are basically six rules, and these rules make the third tone non-existent in compounds. The four tones in Beijing Mandarin are value-wise 55, 35, 214, 51 (Superscript numbers 1-5 are equivalent to IPA tone letters ˩˨˧˦˥). When they combine,
-
-
-
-
-
-
-
-
-
-
-
- Wow, thank you! I feel enlightened (and somewhat miseducated). I'll memorize this method straightaway (taking notes for the time being). Anything else that I ought to know but probably don't from using py-to-ipa and reading online guides written by non-linguists? —Μετάknowledgediscuss/deeds 05:54, 14 January 2013 (UTC)
-
-
-
-
-
-
- Re above, I would help but I know nothing about IPA. Nor am I willing to learn. So many other Mandarin-related tasks to be done. Good luck! ---> Tooironic (talk) 03:50, 16 January 2013 (UTC)
- Oh, one small thing. I noticed that you've tagged these pronunciations as "Beijing". Are they really? I assume they're just Standard Mandarin.... ---> Tooironic (talk) 03:52, 16 January 2013 (UTC)
- Did I do that? I intended Putonghua. —Μετάknowledgediscuss/deeds 03:54, 16 January 2013 (UTC)
- Actually, that's how it was created. I don't know enough to challenge that. —Μετάknowledgediscuss/deeds 03:56, 16 January 2013 (UTC)
- Oh, one small thing. I noticed that you've tagged these pronunciations as "Beijing". Are they really? I assume they're just Standard Mandarin.... ---> Tooironic (talk) 03:52, 16 January 2013 (UTC)
Tbot support
Hi,
There are some questions/requests in User talk:Ruakh/Tbot.js#Testing_and_making_it_work_with_other_languages you may have missed. --Anatoli (обсудить/вклад) 03:29, 14 January 2013 (UTC)
- Thanks. I saw the comments, but I didn't know quite how to reply . . . I'll try. —Ruakh 03:51, 14 January 2013 (UTC)
Aramaic prefixes and suffixes
I noticed an issue with our Aramaic prefix and suffix entries. It seems that whoever added them put the hyphen on the wrong side (I presume to circumvent bidirectionality issues), in addition to the fact that for the Hebrew script it should be a makaf. I corrected several of the Hebrew script ones and even noticed that you yourself fixed -ל to ל־ a while back. The problem is I don't know how many more of them there are and I haven't even gotten to fixing the Syriac script ones. I was wondering if you could generate a list of entries whose titles match "-.*|.*-" and whose body contains an Aramaic L2. If there are too many of them, maybe you could even use your bot to fix them?
I would do it myself but I can't get Python to stop throwing Unicode errors as it reads the dump (which I have found to be caused by a broken XML reader library). Also, would you happen to know if the Syriac script has a version of the hyphen analogous to the Hebrew makaf?
Thanks. --WikiTiki89 03:26, 22 January 2013 (UTC)
- As of the January 10th dump, we had four: -ב, -ד, -ו, and -דיל. So presumably we now have none, though -ו is broken now. (I didn't filter by script or anything, so apparently no Syriac-script entries had this problem.) As for whether Syriac has something makaf-like — I really have no idea. —Ruakh 05:46, 22 January 2013 (UTC)
-
-
- Some or all of the entries in question were added by 334a, so I don't know how much he knows about Unicode. Then the problem might be a little more complicated because I encountered Syriac script links with this problem, I guess they must have been redlinks (I only looked at them within the wiki code so I wouldn't have noticed). And I assume it would be harder to find the problem in links than in entries.
- Also -ו isn't really broken, I redirected it to ־ו (the Hebrew 3rd person singular possessive suffix). When I realized the latter didn't exist, I was too lazy to create it. --WikiTiki89 06:01, 22 January 2013 (UTC)
-
-
-
-
- I'm not sure what "broken" means to you, but to me, a redirect to a redlink is a broken redirect: "broken"! (I mean, I often create a redirect a few minutes before creating the entry it redirects to, but יש גבול.) —Ruakh 06:25, 22 January 2013 (UTC)
-
-
-
-
-
-
-
- Thank you. :-) —Ruakh 06:49, 22 January 2013 (UTC)
-
-
-
-
How to use Category:Hebrew personal pronouns?
I noticed that you reverted (all?) my edits where I added Hebrew personal pronouns to Category:Hebrew personal pronouns. If you look at that category, it is now clearly missing many of Hebrew personal pronouns. Do you disagree that those are Hebrew personal pronouns, or do you find that category useless, or something else? --Thv (talk) 06:46, 25 January 2013 (UTC)
- I think the category is fine; the problem isn't that you added entries to it, but that you removed entries from Category:Hebrew pronouns. The headword line should still be
{{head|he|pronoun|…}}; Category:Hebrew personal pronouns should be added explicitly at the end of the language section. (Sorry; I had intended to either fix these myself afterward, or leave you a message about it, but then it slipped my mind. Thanks for asking about it.) —Ruakh 15:10, 25 January 2013 (UTC)
Page links dump and invalid page IDs
There are around 5000 links in the pagelinks dump that have nonexistant page IDs (the ID is not present in the main dump). Do you know what the deal is with these? DTLHS (talk) 03:24, 26 January 2013 (UTC)
- If a page-ID is in the range [1, 3846626], but is not present in the latest pages-articles.xml, then I assume that usually (always?) means it was assigned to a now-deleted page. If such a page-ID occurs in pagelinks.pl_from, then I imagine it's simply that the link-record failed to be deleted when the page was. (I don't know anything about the history. Honestly, I wouldn't have been shocked if MediaWiki had simply never deleted such records, but if you only found around 5000 such links, then that's apparently not the case.) Do you notice any obvious pattern in these IDs, like, do they mostly clump in narrow ranges, or anything like that? —Ruakh 03:52, 26 January 2013 (UTC)
On certain Latvian grammatical words
Since you asked me a question about the Latvian red links in grammatical templates... The phrases vīriešu dzimte, sieviešu dzimte correspond to masculine and feminine respectively. Now dzimte is one of those 19th-century "neologisms", derived from dzimt "to be born", and means simply "(grammatical) gender". So in principle they should be linked as two words -- "vīriešu" and "dzimte", "masculine" and "gender", i.e., SoP, right? Or should I assume that "vīriešu dzimte" is a phrase simply because it is the "official" name of the masculine gender, it is official grammatical terminology? --Pereru (talk) 15:11, 26 January 2013 (UTC)
- Wiktionarians have argued on this point since time immemorial, and the results of those arguments have been very inconsistent. I think the safest route is probably to link to each word separately, rather than to link to a two-word phrase that may or may not be idiomatic. —Ruakh 16:55, 26 January 2013 (UTC)
February 2013
Linking and tabbed languages
Hello Ruakh --
I saw your recent link fix at ニゴロブナ, thanks for that. Your edit comment brought back to my mind an idea I've been toying with for a bit, that of creating a template for listing JA terms, similar to {{l|ja}} but specifying the lang for the two transliterations (kana and romaji).
For instance, the JA editors I'm aware of (myself, Haplology, I think Anatoli and James Jiao) have used wikicode formatting in lists like that seen at 御#Derived_terms:
* {{l|ja|御子|tr=[[みこ]], ''[[miko]]''}}: a shrine maiden
Your mention that this might screw up tabbed languages makes me wonder if this format is sufficient, but I don't use tabbed languages and don't really know. My idea was to leverage {{l|ja}} into something that might look like {{l-ja|御子|みこ|miko}} and be equivalent to {{l|ja|御子}} ({{l|ja|みこ}}, ''{{l|ja|sc=Latn|miko}}''), with all links properly pointing to the correct language.
Your thoughts? -- Eiríkr Útlendi │ Tala við mig 06:24, 6 February 2013 (UTC)
PS -- no worries about the other day; we all have days like that. o_O
- I created
{{ja-l}}a few months back for this purpose, doing exactly what you describe; but it doesn't seem to have had any uptake. Since then, quite a few more such language-specific{{l}}templates have been created, as subpages of{{l}}, so I guess we should create{{l/ja}}. (It can just redirect to{{ja-l}}. Or we can do it the other way, moving{{ja-l}}to{{l/ja}}.) —Ruakh 15:27, 6 February 2013 (UTC)- That template looks more complicated than just a simple linking template, though. Do you think it is a viable candidate for
{{l/ja}}? —CodeCat 16:56, 6 February 2013 (UTC)-
- Yes. At least, I think that any
{{l/ja}}would need to include all of this complexity. No? —Ruakh 02:38, 7 February 2013 (UTC)- I'd prefer it if the basic linking templates were kept as simple as possible. That doesn't mean a template that combines them can't be created, but I would imagine there are situations where the extra code isn't necessary. It's always possible to make a bigger template out of smaller ones, but the reverse is not true, so the common denominator should probably be kept low. —CodeCat 02:55, 7 February 2013 (UTC)
- But I think that this is as simple as it gets for Japanese. Or at least, it's the simplest thing that could be called "l". (I mean, I'm not dogmatic about it. If you have an idea for how you think
{{l/ja}}should look, I would certainly keep an open mind. But that's how it seems to me right now.) —Ruakh 05:49, 7 February 2013 (UTC)- I was thinking that it would only link to one word, so it would resemble
{{l/sh/Cyrl}}but with another script. We often link to Serbo-Croatian words in pairs (both scripts, since both get an entry) but the linking templates don't support this directly since it is often desired not to link to a pair of words. I figured Japanese could work the same way, with the three (kanji, hiragana, romaji) or four (katakana too) representations of the word linked individually.{{ja-l}}already makes multiple links, so it can act as a convenient replacement for multiple instances of{{l/ja}}together. —CodeCat 14:21, 7 February 2013 (UTC)- I see what you're saying. I guess the thing is, that may be what
{{l}}should be, but it's not what it is: it may have been intended, in part, as a single-link template, but what it is is an approximation to{{onym}}. (An inferior one, granted, but a very widely used one.) So I think{{l/ja}}needs to be what{{ja-onym}}would be, if{{ja-onym}}existed. —Ruakh 03:11, 8 February 2013 (UTC)
- I see what you're saying. I guess the thing is, that may be what
- I was thinking that it would only link to one word, so it would resemble
- But I think that this is as simple as it gets for Japanese. Or at least, it's the simplest thing that could be called "l". (I mean, I'm not dogmatic about it. If you have an idea for how you think
- I'd prefer it if the basic linking templates were kept as simple as possible. That doesn't mean a template that combines them can't be created, but I would imagine there are situations where the extra code isn't necessary. It's always possible to make a bigger template out of smaller ones, but the reverse is not true, so the common denominator should probably be kept low. —CodeCat 02:55, 7 February 2013 (UTC)
- Yes. At least, I think that any
-
- That template looks more complicated than just a simple linking template, though. Do you think it is a viable candidate for
-
-
- Hmm, and thinking it through further, this becomes a more complicated problem space -- some JA terms spelled in kanji have multiple readings, such as 魚釣り, which could be read as either うおつり uotsuri or さかなつり sakanatsuri. This is common enough that the template should ideally be able to handle probably up to three pairs of kana/romaji readings. The logic used at
{{compound}}might be a useful reference. -- Eiríkr Útlendi │ Tala við mig 17:17, 6 February 2013 (UTC)
- FWIW, I've never used
{{ja-l}}simply because I didn't know it existed. (^^); -- Eiríkr Útlendi │ Tala við mig 17:19, 6 February 2013 (UTC)
- Hmm, and thinking it through further, this becomes a more complicated problem space -- some JA terms spelled in kanji have multiple readings, such as 魚釣り, which could be read as either うおつり uotsuri or さかなつり sakanatsuri. This is common enough that the template should ideally be able to handle probably up to three pairs of kana/romaji readings. The logic used at
-
I'm sorry... can we try to sort it out?
I'm sorry for my outburst in the Grease Pit. Sometimes I get a bit overly emotionally attached to certain things because I have a strong idea of what is right or wrong. I was also a bit frustrated at the prospect of having yet another public discussion be derailed by the issue, when it's clearly just between us. I'd like to understand what the problem is in a calmer setting.
First let me explain what I think is your stance, so that we can get any misconceptions out of the way? As far as I'm aware, you prefer certain code templates to have prefixes so that there is a technical barrier for their usage, and only templates that have been explicitly coded around that barrier will accept such codes. I also think that you want those prefixes to be used so that Wiktionary users realise that they are not "normal" codes, and will hopefully act accordingly when using them. I remember you saying something like "different things should look different". Is that correct?
Now, with Lua, we don't actually have a need for the prefixes themselves, because there are other ways to implement "prefixes". For example, we could put them in separate modules, so that anyone who is using those codes will be explicitly aware that they are to be imported and used from a distinct location. So even if a technical barrier is desirable, there may be better and more "Lua-like" ways to do it than with prefixes embedded in the code string. My objection to these technical barriers in general is that while I do agree that different things should look different, regular codes don't actually work that differently from reconstructed codes. In fact, as far as I'm aware, they only differ when linking is concerned, since reconstructed languages are placed in a different location and use a different naming scheme. And implicitly that means that all entries need a sort key, but sort keys may be desirable for non-Appendix languages too (which Lua will be a tremendous help for, by the way!). All other uses are the same: expanding their names (like in the multitude of category boilerplate templates), categorising their entries (Category:Proto-Germanic language is no different from Category:English language), and so on.
So my objection is that while it may be good to make the difference explicit, it also makes it more difficult to handle all cases when there are no differences at all. A template like {{poscatboiler}} should not need to have special support for reconstructed codes, because it treats them exactly the same as regular codes. Similarly for {{head}}, which should just work fine for appendix languages, except for the links. When every template needs special support, it implicitly disables that template for reconstructed languages until someone takes the time to fix it. Which can be frustrating at times. —CodeCat 14:48, 15 February 2013 (UTC)
- I accept your apology. I'm sorry, too.
- Re: "I was also a bit frustrated at the prospect of having yet another public discussion be derailed by the issue, when it's clearly just between us": I don't follow. It seems to me that the only way that it can be "just between us" is if absolutely no one else cares one way or the other; and if that were the case, then public discussion would be both unnecessary and impossible.
- Re: "As far as I'm aware, you prefer certain code templates to have prefixes so that there is a technical barrier for their usage, and only templates that have been explicitly coded around that barrier will accept such codes": This is not true. I actually really hate the templates that try to "code around" this —
{{langprefix}}and so on. - Re: "I also think that you want those prefixes to be used so that Wiktionary users realise that they are not 'normal' codes, and will hopefully act accordingly when using them. I remember you saying something like 'different things should look different'": Yes.
- Re: first half of your third paragraph, from "Now, with Lua, we" to "prefixes embedded in the code string": This section presupposes that I want a technical barrier. Now that I've clarified that I don't want that, I think this section is obsolete?
- Re: second half of your third paragraph, or more specifically, re: "regular codes don't actually work that differently from reconstructed codes": I think they do. I think that the difference between "languages we include" and "languages we don't include" (such as reconstructed languages) is an absolutely fundamental distinction. (By comparison: I'm sure you wouldn't want {{term|parabola|lang=la}}, {{term|parabola|lang=fr}}, {{term|parabola|lang=es}} to generate “parabola, parole, palabra” on the grounds that the only difference between the three words is the location of the entry for them.)
- To put this another way — if you really think that reconstructed languages should be treated just like regular languages, then you should propose that they be put in mainspace. Or if you really think that they belong in appendices, then we shouldn't be treating them like regular languages in all these other respects. But I think that this half-measure — putting them in appendices, but then pretending that these appendices are just regular entries — is really the worst of both worlds.
- Re: your last paragraph: I mostly agree, except that I reach the opposite conclusion: this is exactly why these prefixes need to appear directly in the wikitext, rather than forcing templates like
{{poscatboiler}}to add special logic to hack around the missing prefixes. - —Ruakh 21:30, 16 February 2013 (UTC)
-
- Ok, I think I understand, but I don't quite agree. What exactly is there to gain from making editors so aware of the distinction? I mean, is there ever a problem when they're not aware, given that it's pretty clear through other means that we treat certain languages differently? Do we really need to make it more obvious that Proto-Germanic belongs in an appendix? And why do you think that adding "proto:" to a language code will somehow enable editors to make that connection? To illustrate this: several editors have, in the past, added
{{head}}to reconstructed entries. That is the kind of situation where I think that{{head}}should just work. People expect it to work, and don't understand why it doesn't. I highly doubt that putting a prefix on the code will make even the slightest difference; after all, the editors in question were already very much aware that the language was different, because it was in the appendix namespace and had a different name. Yet that fact apparently did not prevent them from concluding that{{head}}should work there as it does in mainspace. So given that reality, adding "proto:" to the code seems like nothing more than pointless bureaucracy, which really does not help in the slightest with what you intend it to do. I agree with you that{{langprefix}}was a bad idea, but it was created because there was a need for it. That need would not have existed if we had decided back then to drop the prefixes from the templates. So, perhaps ironically,{{langprefix}}would not have existed today had you conceded then. —CodeCat 04:59, 17 February 2013 (UTC)
- Ok, I think I understand, but I don't quite agree. What exactly is there to gain from making editors so aware of the distinction? I mean, is there ever a problem when they're not aware, given that it's pretty clear through other means that we treat certain languages differently? Do we really need to make it more obvious that Proto-Germanic belongs in an appendix? And why do you think that adding "proto:" to a language code will somehow enable editors to make that connection? To illustrate this: several editors have, in the past, added
-
-
- I'm not suggesting that we add proto: to the language code: it's already there. (See e.g.
{{proto:gem-pro}}.) I'm suggesting that we not remove it. (And I'm not sure what you mean by "pointless bureaucracy", anyway, since the assigning codes is inherently an exercise in bureaucracy, and there's no difference bureaucracy-wise between assigning codes like proto:gem vs. gem-pro vs. Proto-Germanic.) And you're right about{{langprefix}}: if I hadn't insisted on making a distinction, or if you hadn't insisted on making no distinction, or if Daniel hadn't insisted on making all templates as complicated as possible, then we wouldn't have ended up with the worst-of-all-worlds situation that we have now. —Ruakh 15:40, 17 February 2013 (UTC)
- I'm not suggesting that we add proto: to the language code: it's already there. (See e.g.
-
-
-
-
- Well, considering that we'd still want to mark our text HTML-wise as Proto-Germanic, we'd necessarily have to use something like "proto:gem-pro". Being HTML-correct is really the main reason why we don't use the prefixes as part of the "proper" code, and the consequence is that the "proto:" that is part of the language template's name becomes nothing more than a technical barrier, which in turn necessitated
{{langprefix}}. I think we both seem to agree that it's not a good thing. So which options do we have?- Keep as it is. That seems like the most complicated option to me, because prefixed template names don't easily translate to prefixed names in Lua. Unless we somehow decide to store the codes with the prefixes, and add them each time we want to look them up (essentially, Lua-fy
{{langprefix}}along with the rest)? That seems rather cumbersome, and like I mentioned, we could just decide to use different tables for reconstructed languages to the same effect, if this is what we want. Nevertheless, if we decide that the method for accessing codes is going to be different for different types, then some analogue to{{langprefix}}is going to be inevitable. - Remove the prefixes entirely (use just "gem-pro" everywhere). This will probably be the easiest to implement, because we already use prefixless codes in all of our entries. This option allows us to get rid of any nastiness that prefixes involve, and has the benefit that the code that is typed in articles is the same that will appear in the HTML lang= attribute.
- Add the prefixes as part of the canonical code (use just "proto:gem-pro" everywhere). This will require a bot to update all the uses. This makes the prefix somewhat redundant because the code already ends with -pro, but that will not apply to, for example, Klingon (conl:tlh in this scheme). It does have the advantage (to you at least) that it's explicit that the code is "different". However, a disadvantage is that any code that creates HTML (which in practice will be most if not all) will have to remove the prefix before putting it in the lang= attribute. Essentially then, we end up with a kind of "reverse langprefix", except one that will need to be used far more often than langprefix itself currently is.
- Some other option?
- Keep as it is. That seems like the most complicated option to me, because prefixed template names don't easily translate to prefixed names in Lua. Unless we somehow decide to store the codes with the prefixes, and add them each time we want to look them up (essentially, Lua-fy
- —CodeCat 17:30, 17 February 2013 (UTC)
- Well, considering that we'd still want to mark our text HTML-wise as Proto-Germanic, we'd necessarily have to use something like "proto:gem-pro". Being HTML-correct is really the main reason why we don't use the prefixes as part of the "proper" code, and the consequence is that the "proto:" that is part of the language template's name becomes nothing more than a technical barrier, which in turn necessitated
-
-
-
-
-
-
- lang="gem-pro" is not valid, so we actually don't (or at least, shouldn't) want that in our HTML. —Ruakh 01:39, 18 February 2013 (UTC)
-
-
-
-
-
-
-
-
-
- Re: first sentence: Yes, I absolutely agree. (This also applies, in a slightly different way, to a handful of exceptional codes used by WMF in general, rather than en.wikt specifically.) Re: second sentence: Not exactly ISO 639, but lang="gem-pro" isn't valid. What the spec requires is that the value of lang="…" "be a valid BCP 47 language tag, or the empty string", where the empty string means "the primary language is unknown".[1] A BCP 47 language tag is not the same as ISO 639 language code, though they're related, and have a lot of overlap. (For example: es "Spanish" is both; es-ES "Spanish Spanish" is a valid BCP 47 language tag but not an ISO 639 language code (the es subtag is an ISO 639 language code, the ES subtag is an ISO 3166 country code); and spa "Spanish" is a valid ISO 639 language code but not a valid BCP 47 language tag (because BCP 47 requires that two-letter codes be preferred to their three-letter synonyms).) For a friendly-but-thorough guide to BCP 47, see http://www.w3.org/International/articles/language-tags/ and http://www.w3.org/International/questions/qa-choosing-language-tags. Re: third sentence: It really depends. In the case of gem-pro, we can keep the gem subtag, but the pro subtag is invalid (it means Old Provençal, and can only be used as the first subtag of a language tag). We might write lang="gem" (perfectly valid, but vague: it means "some Germanic language"), or we might write something like lang="gem-x-proto" (also valid: x means "private use", and everything after it merely has to be syntactically valid, since its semantics are defined by private agreement). Obviously neither of these is ideal, but I think we're unlikely to find any better approach. (You can read the three pages I linked to, and form your own opinions.) —Ruakh 03:18, 18 February 2013 (UTC)
-
-
-
-
-
-
-
-
-
-
-
-
- I remember reading something about x- but I wasn't sure what it would be needed for. As I understand it right now, a tag is split into parts with hyphens as separators. When a part is "x" it means "everything following is nonstandard". So the x-pro part means "do not try to interpret 'pro' as you normally would". If I'm not mistaken, "x-gem-pro" would also be valid? But browsers would not be able to understand from such a code that it is Germanic, whereas with "gem-x-pro" they would parse "gem". Is that correct? —CodeCat 03:41, 18 February 2013 (UTC)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Yes, that's correct. (Well, except the word "parse". Technically even the stuff after x is still parsed, so it still has to be syntactically valid. Something like lang="gem-x-proto_germanic" would not be valid. But I assume you're just using the word "parse" colloquially, and I shouldn't read too much into it?) —Ruakh 04:01, 18 February 2013 (UTC)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Yes sorry, I meant it more as that it doesn't try to interpret the meaning of what it reads. In any case, if that is how it is, then I think we should change the codes we currently use so that they match the standard. It seems a bit hypocritical that we worry about other parts of the lang= attributes but ignore this. I think the easiest way would be to insert -x- in all the exceptional codes, but we could also change -pro to -proto if that is allowed. We could, as an exception, decide to leave out that part in template names, so that the old name is still used for naming templates (out of convenience), like
{{gem-verb}}or{{ine-noun}}. Going back to your desire to make proto-languages appear distinctive... do you think that it's distinctive enough if the code ends in "-pro(to)"? —CodeCat 04:14, 18 February 2013 (UTC)
- Yes sorry, I meant it more as that it doesn't try to interpret the meaning of what it reads. In any case, if that is how it is, then I think we should change the codes we currently use so that they match the standard. It seems a bit hypocritical that we worry about other parts of the lang= attributes but ignore this. I think the easiest way would be to insert -x- in all the exceptional codes, but we could also change -pro to -proto if that is allowed. We could, as an exception, decide to leave out that part in template names, so that the old name is still used for naming templates (out of convenience), like
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Yeah, {{context|…|lang=proto:gem-x-proto}} might be overkill. Personally I'd prefer {{context|…|lang=proto:gem}}, but I think I can accept {{context|…|lang=gem-x-proto}} as a compromise. (It's not ideal from my standpoint, because the -x- is really there because Ethnologue doesn't include the language, rather than because we don't — we might end up having -x- in some languages we allow, and lacking it in some languages we don't — but I think I can accept it.) —Ruakh 04:33, 18 February 2013 (UTC)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Yes, I'd prefer -proto over -pro. (My impression is that the only reason -pro was introduced is that someone thought it was more standard to use a three-letter subtag. Which is sort-of true — real extension subtags are three letters — but pro isn't and can't be a real extension subtag, so it's nonstandard either way.) —Ruakh 04:42, 18 February 2013 (UTC)
-
-
-
-
-
-
-
-
-
-
-
olibanum
Would appreciate some {{attention}} here if you have a sec…. Ƿidsiþ 07:36, 16 February 2013 (UTC)
- Done. —Ruakh 21:31, 16 February 2013 (UTC)
Cheers. Ƿidsiþ 21:51, 16 February 2013 (UTC) Another one: tohu-bohu! Ƿidsiþ 15:29, 19 February 2013 (UTC)
Whitelisting pages
Happy Purim. I'm almost positive you know how to whitelist pages; so do I in general, but this case is complicated enough that I'm afraid of getting it wrong and wonder if I can bother you, please, to handle it (if, of course, you agree). The relevant discussion was 'closed' here and here's a handy link to the JS.—msh210℠ (talk) 05:37, 24 February 2013 (UTC)
- Happy Purim! And yeah, I'm who wrote MediaWiki:Gadget-PatrollingEnhancements.js, so am generally your best bet for changes to it. (Obviously we all try to write JS that anyone else can understand and edit, but that's easier said than done.) I've now made the change that you indicate; see MediaWiki:Gadget-PatrollingEnhancements.js?diff=19626792&oldid=19232663. —Ruakh 06:22, 24 February 2013 (UTC)
-
- Um, can somebody please create an entry for פורים? TIA —Μετάknowledgediscuss/deeds 06:52, 24 February 2013 (UTC)
- Done.—msh210℠ (talk) 07:33, 24 February 2013 (UTC)
- Thanks! —Μετάknowledgediscuss/deeds 15:44, 24 February 2013 (UTC)
- Done.—msh210℠ (talk) 07:33, 24 February 2013 (UTC)
- Um, can somebody please create an entry for פורים? TIA —Μετάknowledgediscuss/deeds 06:52, 24 February 2013 (UTC)
Name of Module:he-utilities
I had already created Module:sl-common for a similar purpose. It's probably better to use the same names, so which should we use? —CodeCat 01:37, 25 February 2013 (UTC)
- Let me preface this by saying that I think we're in a bit of a discovery-and-experimentation phase, so for things that don't affect anything and are easily changed, it might not be worth worrying too much about consistency just yet. I mean, it's nice for different languages' modules to have similar names, so they're easier to discover and remember, but it doesn't really matter if they don't. Note that template-names are not always consistent between languages, and it's never really caused a problem; the difference in behaviors of different languages' templates has always dwarfed the difference in names. And while it's a bit ugly to do so, we can always create what might be called "shims" or "pseudo-redirects", e.g. creating Module:he-common as return require('Module:he-utilities'). (Disclaimer: not tested.) But I don't actually object to consistency, of course, and since you've raised the point, I'll reply.
- Re: "common" vs. "utilities": I have no preference. To me the names imply slightly different things, but both seem applicable here. If you prefer "common", feel free to move Module:he-utilities accordingly.
- —Ruakh 03:34, 25 February 2013 (UTC)
Red (well, black) links in Latvian inflection tables
You had asked me about a month ago to add the Latvian words for certain grammatical notions that are mentioned in Latvian inflection tables. I have just finished doing that, and I've also crossed out the respective items at User:DTLHS/WantedPages. I just wanted to let you know. --Pereru (talk) 21:00, 25 February 2013 (UTC)
- Thanks! —Ruakh 07:21, 26 February 2013 (UTC)
Lua: Calling a function through a string that has its name
In Module:nl-verb, the export.conjugate function has what is basically a switch statement that "forwards" the call to the correct function. But that could be written more neatly if I could just tell it to call "conjugate_" .. conj_type as though it were a function. In other words, to call a function through a string with its name (which you can construct dynamically). Do you know if it's possible to do this? —CodeCat 21:46, 27 February 2013 (UTC)
- I don't believe there's any way to do exactly what you describe — AFAIK locally-scoped identifiers are only available statically — but it's easy to do approximately what you describe; I've edited Module:nl-verb to show what I mean. Actually, you were already most of the way there. —Ruakh 05:07, 28 February 2013 (UTC)
-
-
- There is a table that stores all globals — it's _env — but due to the nature of lexical scope, local variables are a different beast entirely. (But why does it matter?) —Ruakh 15:36, 28 February 2013 (UTC)
-
-
-
-
- Well, before you changed it, those functions were global, so they could have been called through that table without making one myself. On the other hand, for "security" reasons it might be better to explicitly restrict the list of callable functions (so that someone doesn't try to invoke, say, the function "make_table"). —CodeCat 15:47, 28 February 2013 (UTC)
-
-
-
-
-
-
- Re: "those functions were global": Oh my gosh, you're right. I had assumed that function declared local identifiers, like in JavaScript, but it doesn't. That explains why you were able to call functions before their declarations.
I think it goes without saying that we should avoid global variables, including global functions.
—Ruakh 15:57, 28 February 2013 (UTC)
- Re: "those functions were global": Oh my gosh, you're right. I had assumed that function declared local identifiers, like in JavaScript, but it doesn't. That explains why you were able to call functions before their declarations.
-
-
-
-
-
-
-
-
- I don't think that makes sense. Being able to order functions the way we want, without having to worry about declaring them in advance, is a very good thing. Functions are global in most other languages, too. Then again, I wonder what significance "global" has in this case. If a function is global in one module, can it be called globally from another module that imports it? —CodeCat 16:11, 28 February 2013 (UTC)
-
-
-
-
-
-
-
-
-
-
- That is a fascinating question. I had simply taken for granted that "global" means "global" — why else, for example, would we be writing local p = {} in all our modules — but you are quite right to ask it, because as far as I can tell by testing, global variables are actually not shared between modules. Not only does a module not see the globals of a module that it imports, but what's more, even the MediaWiki-provided globals are not really shared. For example, every module has mw, but setting mw.foo in one module does not affect other modules that import it. (Incidentally, the same is true of the debug console: it doesn't see globals created within the body of the module you're debugging.) So, yeah, disregard my previous statement: we can use "globals" all we want.
By the way, I mentioned _env above, but I was misreading the documentation: it's actually _G.
—Ruakh 03:56, 1 March 2013 (UTC)
- That is a fascinating question. I had simply taken for granted that "global" means "global" — why else, for example, would we be writing local p = {} in all our modules — but you are quite right to ask it, because as far as I can tell by testing, global variables are actually not shared between modules. Not only does a module not see the globals of a module that it imports, but what's more, even the MediaWiki-provided globals are not really shared. For example, every module has mw, but setting mw.foo in one module does not affect other modules that import it. (Incidentally, the same is true of the debug console: it doesn't see globals created within the body of the module you're debugging.) So, yeah, disregard my previous statement: we can use "globals" all we want.
-
-
-
-
-
March 2013
Something funny
Lua's syntax requires that functions have names that are identifiers, but since functions can also be put in tables, you can really name them anything. I discovered that that includes:
local export = {} export[""] = function(frame) -- ... end return export
{{#invoke:something|}}
I thought that was kind of funny, since I don't know many other languages that let you do this. —CodeCat 02:34, 2 March 2013 (UTC)
- Maybe I'm missing what you're getting at, but it seems to me that the same is true in most scripting languages, including Perl ($foo{''} = sub { ... }), JavaScript (window[''] = function () { ... }), and Python (foo[''] = lambda : ...). You'll find this in any language that offers associative arrays and first-class functions. —Ruakh 07:04, 2 March 2013 (UTC)
How do you think I should do this?
I would like to work on converting most uses of {{head}} in Dutch to templates specific to Dutch (which are Lua-fied and faster). But many of them would essentially be the same, because for most of them only a headword is actually needed, nothing else. It definitely makes sense for there to be a single Lua function that does all of them (with a parameter to specify which PoS to categorise in). But I'm not sure what to do on the template side of things. The current approach taken by most languages is to have a single template for each PoS, so with Lua that would mean having many templates all invoke the same Lua function, with a parameter to specify the PoS. An alternative approach is to create {{nl-head}} and give it a parameter that is then passed onto Lua. Basically, it would move the PoS-parameter from being in the templates to being in the entries themselves. However, there is the danger that some editors will see {{nl-head|preposition}} and think, hey, I can probably write {{nl-head|noun}} too! And that's something we definitely don't want. —CodeCat 15:40, 5 March 2013 (UTC)
- Re: first sentence: We really need to Lua-ify
{{head}}, too. For obvious reasons, I think it'll be a while before we figure out languages well enough that we can really Lua-ify{{head}}properly, but I don't think there's much benefit to moving away from{{head}}in the meanwhile (at least for that reason). Besides, unless there are ==Dutch== entries with large numbers of POS sections, the Dutch-specific templates aren't "faster" in any meaningful sense. (Is a page that takes 18.1ms "faster" than a different page that also takes 18.1ms?) - Re:
{{nl-head|noun}}: If that's really something that we never want, then I don't think there's really problem, since{{nl-head}}can simply add a cleanup category — or simply call the function for{{nl-noun}}, which will add a cleanup category due to missing arguments. - —Ruakh 16:41, 5 March 2013 (UTC)
-
- You haven't really answered my question though. I am wondering whether having separate templates like
{{nl-prep}},{{nl-interj}}the way we current do is preferable to having a single{{nl-head}}with a parameter. I do prefer it for consistency reasons, but I thought that some people might not like it because it leads to creating a separate template for every PoS (even if the Lua code behind each is shared). —CodeCat 18:06, 5 March 2013 (UTC)
- You haven't really answered my question though. I am wondering whether having separate templates like
-
-
- I have no preference. This question seems better-suited to Wiktionary talk:About Dutch. —Ruakh 06:13, 6 March 2013 (UTC)
-
Memory Stick and memory stick
Hi there. I wasn't aware that removal of (supposed) definitions like that required verification in that way (although it makes sense, I'm simply not hugely familiar with Wiktionary rules; I have spent a lot more time editing Wikipedia and am more used to the burden of proof being the other way round). Thanks for pointing me to rfv-sense. Alphathon (talk) 06:55, 7 March 2013 (UTC)
- The burden of proof is still the same way 'round: the sense will be removed if no one presents evidence for it. (You don't have to present evidence for its nonexistence, or anything like that.) It's just that we prefer to leave the content in-place, with the warning tag, while the initial discussion is going on. —Ruakh 07:04, 7 March 2013 (UTC)
Split comma-separated genders
I am not sure if that is a good idea. Besides making the module slower, it would also end up enabling that behaviour for all templates, even though most of them would probably not need it. I would prefer it if the calling module would perform the split, rather than the gender-and-number module. —CodeCat 16:20, 8 March 2013 (UTC)
- I figured that most templates would need it, and that it was best to handle it consistently in the ideal way, rather than having some templates split on comma, some split on space, some that support only a single gender/number specification, and so on. (Incidentally, if you want to change it to accept only commas or only spaces, I'd be down with that. I wasn't sure which was better, but supporting both is probably actually not good, because then people will be inclined to use a comma followed by a space, and the module doesn't currently handle empty specifications intelligently. Alternatively, of course, we could change the module to handle empty specifications intelligently.) The major group of templates that don't need it are the ones that completely generate their gender/number information internally (rather than taking it from template parameters), and of course, such templates can simply ignore this feature. I don't buy the "making the module slower" argument, because the module always calls split at least once anyway, so even if split were this incredibly expensive function that dwarfed all other aspects of the module, this would still only be a less-than-factor-of-two slowdown. —Ruakh 03:38, 9 March 2013 (UTC)
-
- The slowdown for a single call will not be terribly significant, but this function may be called hundreds of times on a page because of the genders in translation tables. So every small amount can easily multiply, and it's probably a good idea to remove anything that isn't strictly necessary. I don't know which templates would actually need it, though. Can you give an example of a case where multiple genders can't be passed in as a list? —CodeCat 10:13, 9 March 2013 (UTC)
-
-
- Templates don't support lists. (BTW, the Lua term is actually "sequence", but I'll use your terminology for this comment.) So any Luified template that accepts multiple genders from the user will have to offer some non-list mechanism for doing so. One approach is to have a series of separate parameters (say, 1 and g2 and g3) and then assemble them into a list. Another is to have a single parameter and split it into a list. I think the latter is clearly superior from the user's standpoint.
Re: "it's probably a good idea to remove anything that isn't strictly necessary": Fortunately, it's obvious that you don't actually believe that, because if you did, then the module would only contain export.format_single and export.COMMA (the latter being "'', ''"), and calling modules would assemble their genders into a string rather than into a list. Instead, you made an effort at encapsulation, at exposing only a single function, export.format, for software-engineering reasons. I think that's fine. But it is clearly inferior from a performance standpoint, and the only reason to do it is if we care about humans.
—Ruakh 19:14, 9 March 2013 (UTC)
- Templates don't support lists. (BTW, the Lua term is actually "sequence", but I'll use your terminology for this comment.) So any Luified template that accepts multiple genders from the user will have to offer some non-list mechanism for doing so. One approach is to have a series of separate parameters (say, 1 and g2 and g3) and then assemble them into a list. Another is to have a single parameter and split it into a list. I think the latter is clearly superior from the user's standpoint.
-
-
-
-
- Ok, I understand that part. But the main module doesn't have to support the list-splitting itself. For example, take
{{es-noun}}as an example, which accepts "mf" as a gender. There is nothing wrong with that, but it is incompatible with both the new module and the old templates. Consequently, the template has to convert the gender information into a new format, through a conditional which then forwards it onto the templates{{m|f}}. What I am proposing is to allow each individual template to specify in its own terms how multiple genders are to be indicated, and to expose a single interface on the Lua side, using a table of strings. An example would be Module:nl-head, which has a g2= parameter. If a template decides to combine multiple genders into one parameter, then it also carries the responsibility of splitting/interpreting its parameter before passing it on to Module:gender and number. So basically, I am arguing that splitting on commas/spaces should not happen in Module:gender and number, but in the modules that call it. Of course, if you think that we should rather get rid of multiple parameters for genders (therefore, remove the g2= parameter) and use a single string that contains all information encoded within it in some format, then that's different. But I'm not sure what benefits there would be in such an approach. The advantage to forcing each module to take "responsibility" for the split itself is that it is able to analyse the genders itself and perhaps add categories. For example, both{{nl-noun}}(through Module:nl-head) and{{sl-noun}}check to see whether the gender is correct; such a check would be more difficult if the whole multi-gender parameter is passed verbatim to Module:gender and number, and would probably mean that the calling module has to split the string anyway to get the information from it, which somewhat defeats the purpose of deciding to let Module:gender and number perform the split. —CodeCat 19:35, 9 March 2013 (UTC)
- Ok, I understand that part. But the main module doesn't have to support the list-splitting itself. For example, take
-
-
-
-
-
-
- Re:
{{es-noun|mf}}: If we want to keep these sorts of ad hoc notations, then fine, but it would be better to write{{es-noun|m,f}}. This way it's easy for all templates to do it the same way — a way that's (hopefully1) easy for users to remember.
Re: g2=: Absolutely I think this is a bad user interface, incredibly inconsistent between otherwise-identical templates. We were restricted to these sorts of hacks when we were tacking multiple-gender support onto a system that already supported a single gender, but we aren't anymore! (Also, BTW, if you're going to be all microoptimization-obsessed beyond anything that could conceivably be measured, then I believe you should prefer splitting in Lua over templates that take multiple parameters.)
Re:{{nl-noun}}and{{sl-noun}}: If they perform the split anyway, then there's nothing to discuss; they can pass the resulting table into export.format, exactly as you'd already planned. (Note that your argument applies just as well to your existing list approach: both{{nl-noun}}and{{sl-noun}}have to loop over the gender specifications to validate them, which defeats the purpose of letting Module:gender and number handle the looping!)
1. Speaking of users, we should ask them about this. I mean, I already know what Stephen will say, and if you start the discussion then I already know what DCDuring will say, but we should ask normal editors, too.
—Ruakh 20:12, 9 March 2013 (UTC)
- Re:
-
-
-
-
-
-
-
-
- In that case, I think supporting a single gender parameter for all languages is a good idea. But I love to be nitpicky so I am not sure if I like separating them with commas. How about "m/f" instead of "m, f"? Keep in mind that while it may be nice to have the gender code entered the same way as it's displayed, there's no guarantee that the current module will always display "m, f". Maybe we will decide someday that we prefer "m or f" instead. —CodeCat 20:21, 9 March 2013 (UTC)
-
-
-
-
-
-
-
-
-
-
- Yeah, I'm not married to commas. One thing that I don't like about my single-parameter approach is that it essentially creates a mini-language with two infix operators, so it needs to be obvious at a glance which operator has higher precedence. I'm not sure that commas meet that test: is it obvious that m,f-p means "{m},{f-p}" and not "{m,f}-{p}"? I'm not sure. A better option might be semicolons: m;f-p, maybe? Or maybe it's really not possible without spaces: m f-p or m, f-p or m; f-p or whatnot. One advantage of commas, of course, is that as long as we do display them with commas, the commas will be the easiest operator to remember. —Ruakh 21:21, 9 March 2013 (UTC)
-
-
-
-
-
-
-
-
-
-
-
- (I'm a bit late to the party, I guess, and haven't even looked at the code y'all're discussing, so am commenting based only one what I've gleaned from the discussion here (and what minimal intelligence I can lay claim to).) How about
m;f,pto code "masculine; feminine plural"? I think that's easy for non-coders to remember, as it sort-of matches normal English usage. Even better, how aboutm,fporm;fp— but only if concatenated without delimiters is possible, which I don't know.—msh210℠ (talk) 04:26, 10 March 2013 (UTC)
- (I'm a bit late to the party, I guess, and haven't even looked at the code y'all're discussing, so am commenting based only one what I've gleaned from the discussion here (and what minimal intelligence I can lay claim to).) How about
-
-
-
-
-
-
-
-
-
-
-
-
- Re: concatenated without delimiters: That would be awkward, because not all the templates in Category:Gender and number templates have single-letter names. I don't see any truly ambiguous potential sequences, but it would still be icky. (Also, that category doesn't contain all possible codes.
{{pf.}}and{{impf}}belong to essentially the same class, and CodeCat is now hoping to introduce an, in, and pr.) But comma-and-semicolon seems fine to me. —Ruakh 07:02, 10 March 2013 (UTC)
- Re: concatenated without delimiters: That would be awkward, because not all the templates in Category:Gender and number templates have single-letter names. I don't see any truly ambiguous potential sequences, but it would still be icky. (Also, that category doesn't contain all possible codes.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Perfect and imperfect can be separated from the others, because they are used for a different part of speech. At least, I'm not aware of any verb that has gender. Verb forms may, but we indicate that in the form-of definition rather than on the headword line, and no verb lemma has a single gender as far as I know. —CodeCat 13:30, 10 March 2013 (UTC)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- It's true that we're unlikely to combine
{{pf.}}("perfective") or{{impf}}("imperfective") with{{m}}or{{p}}, but we probably want the same module to handle them, for a few reasons:{{t|xx|foobar|m}}and{{t|xx|foobar|impf}}should both work, and{{t}}shouldn't need to examine its argument to try to decipher what module handles it.- we want user-input to be handled analogously; if m;f means "masculine or feminine", and pf and impf mean "perfective" and "imperfective" (respectively), then pf;impf should mean "perfective or imperfective").
- we want presentation to be analogous.
- Incidentally, your pr code ("personal") suggests a point of possible overlap between noun classes and verb classes, though that doesn't really matter in and of itself.
- —Ruakh 23:24, 10 March 2013 (UTC)
- It's true that we're unlikely to combine
-
-
-
-
-
-
-
-
Final-removing Lua function
Do we have a Lua function that takes a single Hebrew-script word, and if it contains no finals spits it back out, but converts finals to their medial forms? I could use one for Yiddish, and I imagine it could be quite helpful for Hebrew as well. Please note that the requirements will be slightly different, though, because in Yiddish the medial form of ף is פֿ. Thanks! —Μετάknowledgediscuss/deeds 03:47, 15 March 2013 (UTC)
- I've now created Module:yi-utilities with such a function. You might also look through Module:he-utilities and see if there's anything there that you want to appropriate. (It does have a function to convert an individual letter from medial-or-final to medial, but not one that accepts an entire word. I guess there's no reason it couldn't.) —Ruakh 17:21, 15 March 2013 (UTC)
- Excellent! Thanks! —Μετάknowledgediscuss/deeds 19:38, 15 March 2013 (UTC)
Some advice?
I'm working on Module:ca-head, and I have a question about the make_plural function. The way it should work is like this: each replacement is tried in sequence, and as soon as a replacement is made, the result is returned. It works already, but it seems like a rather bad way to do it because each possibility has to be matched twice: first to see if it's in the string, and then again to do the actual replacement. Would you know if a more elegant way to do this, which automatically "aborts" all remaining possibilities once a successful replacement is made? —CodeCat 21:30, 20 March 2013 (UTC)
- I think "elegant" is subjective, but probably the tersest approach is to write a helper function that handles an arbitrary number of non-cascading substitutions — maybe something like this:
-
-
function ending_swapper(base, ...) local swaps = { ... } local num_swaps = # swaps for i = 1, num_swaps, 2 do local ret, n = mw.ustring.gsub(base, swaps[i] .. '$', swaps[i+1]) if n > 0 then return ret end end return nil end
-
- and then use it something like this:
-
-
function make_plural(base, gender)
local ret = ending_swapper(base, "ça","ces", "ca","ques", "qua","qües", "ja","ges", "ga","gues", "gua","gües", "a","es") if ret then return ret end ret = ending_swapper(base, "à","ans", "[èé]","ens", "([gq])uí","%1uins", "([aeiou])í","%1ïns", "í","ins", "[òó]","ons", "ú","uns") if ret then return ret end if gender:find("^mf?$") then ret = ending_swapper(base, "às","asos", "[èé]","esos", "([gq])uís","%1uisos", "([aeiou])ís","%1ïsos", "ís","isos", "[òó]s","osos", "ú","usos", "[çsxz]","%0os") if ret then return ret end if base:find("sc$") or base:find("st$") or base:find("xt$") then return base .. "s", base .. "os" end end if gender == "f" then if base:find("s$") then return base end if base:find("sc$") or base:find("st$") or base:find("xt$") then return base .. "s", base .. "es" end end return base .. "s"
end
-
- . . . which isn't an all-or-nothing deal. For example, you could take the concept of having a helper function that calls gsub and that returns nil when there's no match, but instead of taking many arguments at once, you could chain the calls like return h(base, "ça", "ces") or h(base, "ca", "ques") or ... or (gender:find("^mf?$") and (h(base, "às" ,"asos") or h(base, "[èé]", "esos") or ...)) or .... Or whatever.
- —Ruakh 06:52, 21 March 2013 (UTC)
-
- It does look like an ok solution, not the clearest one though. Terseness is nice but it shouldn't be detrimental to code clarity. I do like your or-solution though... that kind of fits my idea of "elegant" because it uses the language's own idioms. I'll see what I can do. Thank you. —CodeCat 13:42, 21 March 2013 (UTC)
April 2013
A request for your input
Can you have a look at Module talk:ru-translit#How can this be used from another Lua module?? —CodeCat 12:57, 11 April 2013 (UTC)
technical question
Hi. Could me tell what is the parameter that was added to an url, for example, http://en.wiktionary.org/w/index.php?title=Title&action=edit to display name of MediaWiki messages instead of normal text, like (PAGETITLE) instead of its title etc. I can't remember and I can't find it here. Maro 23:55, 14 April 2013 (UTC)
- http://en.wiktionary.org/w/index.php?title=Title&action=edit&uselang=qqx. (uselang=... specifies an interface language; for example, you can use uselang=pl to view the interface in Polish, though of course then you lose the benefit of all our nicely customized English messages with helpful links. qqx is a "private use" code — it will never be assigned to a real language — and the uselang feature uses it for the purpose that you describe.) —Ruakh 04:36, 15 April 2013 (UTC)
Your name came up
...in IRC discussion regarding the DICT project. Just sayin'... Would you have an interest if things started to move forward on this one? - Amgine/ t·e 15:47, 16 April 2013 (UTC)
- If you are interested in mentoring someone on this, we could move the project to one of those that is "Featured" and thus more likely to get a student interested. -- ☠MarkAHershberger☢(talk)☣ 18:07, 16 April 2013 (UTC)
-
- It sounds like a valuable project, so I hope someone steps up, but I'm not sure that someone should be me. What exactly is involved in being a mentor? Especially — how much time would I be expected to commit? A GSOC is a big deal for a student — it's a lot like an internship — so it would really be unfair to him/her if I (or anyone) agreed to mentor but didn't actually commit the necessary time. (I imagine it could also potentially damage Wiktionary's ability to get GSOC students in the future.) I recently started working at the world's largest online retailer, and I love it, but the rumors are really true about the incredibly long hours that developers put in. I have way, way less spare time now than I used to. :-P —Ruakh 06:38, 17 April 2013 (UTC)
-
-
-
-
- As for the mentoring schtick... While I would love for you to be able to work on the DICT project, we don't have a
victstudent working on that one. On the other hand, a much lower-time-cost project *does* need a community liaison: Bugzilla and GSOC application draft. This project is to build a pronunciation recording tool for Wiktionary, so we can have a simple method for our readers to contribute a recording of a word pronunciation. There is a WMF developer to do the software-side mentoring, but the student needs someone who is a regular part of the Wiktionary GP community to advise and go back-and-forth to the Grease Pit regulars for input/reporting. I think it should be about a half-hour or so per week, if the application is approved, mostly e-mailing. - Amgine/ t·e 15:28, 29 April 2013 (UTC)
- As for the mentoring schtick... While I would love for you to be able to work on the DICT project, we don't have a
-
-
-
Module:he-translit
Hi,
I don't know if this tool can become really useful but it can definitely get better. Could you check, add missing letters (if any) and transliterate the diacritics, please? If the pronunciation differs depending on the position, could you put a short comment, please? Module:ar-translit is a bit more advanced but it can't do a perfect job for the obvious reasons, e.g. اَلْلُغَةُ ٱلْعَرَبِيَّةُ: al-luġa(t) al-ʿarabiyaa(t)
I can't read Hebrew but it may be easier for me to see what letters are used with the tool. E.g. the call on the module with מִפְעָל currently produces: mif(a/o)l. What should it be? Do you think it's possible to transliterate fully vocalised Hebrew in a more or less accurate way? --Anatoli (обсудить/вклад) 23:03, 17 April 2013 (UTC)
- Since there's no way for it to distinguish between a kamatz gadol (transliterated as "a" on Wiktionary according to WT:AHE) and a kamatz katan ("o"), the automated transliterations would not be perfect, so it would probably be preferable to fill them in manually, no? --Yair rand (talk) 23:29, 17 April 2013 (UTC)
-
- Of course, the manual override is preferable, if there are ambiguities but could there be default values, "a" and "o"? Another option is to follow Persian and Arabic and put "(a/o)", "(e/ei)" in brackets with a slash, so that people know they have to decide, which one is right like I did with ["פ"]='(p/f)', ["ף"]='(p/f)'. --Anatoli (обсудить/вклад) 23:45, 17 April 2013 (UTC)
Rukhabot
I know you're really busy, I was just hoping that you could run Rukhabot a bit more. It's been more than 3 weeks since the last run, and we are slowly becoming somewhat dependent on bots like these. Thank you, and however much time your job is eating away, I hope you're enjoying it! —Μετάknowledgediscuss/deeds 05:47, 24 April 2013 (UTC)
- Which one? (I assume you mean either interwikis or trans-links?) —Ruakh 02:51, 25 April 2013 (UTC)
-
- Both are good, but trans-links are, AFAIK, the sole domain of Rukhabot, and thus more important. —Μετάknowledgediscuss/deeds 03:05, 25 April 2013 (UTC)
- Thanks for this, as well as for drastically improving the layout of
{{af-personal pronouns}}. I wouldn't have thought to arrange it thus, but I do believe it looks better now. —Μετάknowledgediscuss/deeds 01:41, 3 May 2013 (UTC)
Links to Hebrew-script terms
Category:term cleanup/sc=Hebr contains a list of pages that use {{term}} with sc=Hebr but without a language. These should have a language instead of a script, but a bot can't automatically replace them (unlike, say, Gothic script) because there are several language that are written in Hebrew. Could you help? —CodeCat 14:03, 27 April 2013 (UTC)
Python question
Hi.
I know you're busy, and will perfectly understand your not replying (or replying in the negative) to this, but I have a pywikipediabot question (about regexes. I'm trying to do in Python what a JavaScript script you once wrote does). It's at [2]; any ideas you can provide would be much appreciated.—msh210℠ (talk) 05:07, 3 May 2013 (UTC)
- Oh, that message mentions
user-fixes.py. That's described at [[mw:Manual:Pywikipediabot/user-fixes.py]]: essentially, it's a bunch of so-called fixes, each of which is a hash that specifies a regex replacement and an edit summary.—msh210℠ (talk) 05:12, 3 May 2013 (UTC)
Okay, I've got an answer there; I'll try it out (I've no time to now); so, meanwhile, completely ignore the above.—msh210℠ (talk) 20:34, 3 May 2013 (UTC)
Forced user renames coming soon for SUL
Hi, sorry for writing in English. I'm writing to ask you, as a bureaucrat of this wiki, to translate and review the notification that will be sent to all users, also on this wiki, who will be forced to change their user name on May 27 and will probably need your help with renames. You may also want to help with the pages m:Rename practices and m:Global rename policy. Thank you, Nemo 13:09, 3 May 2013 (UTC)
Your replacement for Template:context
I came across {{plural}}, which seems rather redundant to me compared to {{p}} - it literally just contains ''plural'', so there is no advantage over just typing that out, same amount of characters. I was looking at the transclusions and noticed that a significant number of them are caused by {{context}}. That made me think about your efforts to replace the template with something else. Now that there is Lua, I presume you'd want to write it in Lua instead. Do you think you could try to do that anytime soon? I could also try if you don't have the time. —CodeCat 19:29, 6 May 2013 (UTC)
Some errors in Module:ko-translit
Hi,
Could you please fix the module when you have time. It has errors at the moment (wasn't caused by my last edit, I know it was working after that, it's something else). --Anatoli (обсудить/вклад) 01:06, 8 May 2013 (UTC)
- So . . . many . . . talk-page . . . comments. Yours wins for the most interesting debugging problem. Previously, even global variables were not shared between Scribunto modules that import-ed each other, but that must have changed, I guess. Inserting local, so that Module:ko-hangul and Module:ko-translit weren't both messing with the same p, fixed the issue. (The test-cases are still failing, but only because the transliteration scheme implemented by the module differs from the one the test-cases expect. You should have no difficulty fixing that.) —Ruakh 08:33, 11 May 2013 (UTC)
User_talk:Rukhabot#Autotranslit to write transliterations from modules where it's missing?
Hi Ran,
I've posted in User_talk:Rukhabot#Autotranslit to write transliterations from modules where it's missing?. Do you think it's feasible?
Also, Category:Hebrew translations lacking transliteration may need attention of active Hebrew editors. --Anatoli (обсудить/вклад) 01:15, 21 May 2013 (UTC)