Wiktionary:Grease pit/2014/April

Typography update

The typography update went live today. If anyone notices any problems on Wiktionary related to the update, please ping me. One issue I'm aware of is that some combining diacritics and tie characters may be incorrectly positioned in Firefox on MacOS or Linux (due to lack of proper glyph positioning data in the Liberation Sans and Helvetica Neue fonts). The problem does not seem to occur in other browsers, however. If this problem is significant, the following can be added to MediaWiki:Vector.css:

html,
body {
    font-family: Arimo, Helvetica, Arial, sans-serif;
}

...to override the new font stack of...

html,
body {
    font-family: Arimo, "Liberation Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
}

Kaldari (talk) 20:01, 1 April 2014 (UTC)[reply]

Recent layout changes

Moved from Wiktionary talk:News for editors

Since a couple of hours ago, the headings for words and languages are given in a different typeface than they were before. At least in my browser. Now that's fine, of course. But I'm wondering if it has anything to do with the strange phenomenon that homophones are somehow not clickable anymore. Example: Homophone: hello. The word "hello" should be blue, but it's not. At least in my browser, again.Kolmiel (talk) 22:45, 1 April 2014 (UTC)[reply]

I don't think the layout changes themselves are causing the problem, but it's whatever other changes have been made during the update that seem to have caused it. A rather heavily used template, {{isValidPageName}}, which is used by the {{homophones}} template, is now broken. This code should display "valid", but instead it shows nothing: {{isValidPageName|hello}}. I don't know why it's not working anymore, although some further investigating would probably find the problem.

In any case, I wonder why {{homophones}} needs this template to begin with. We could bypass the problem if we removed it from there altogether. {{isValidPageName}} is listed as deprecated, so if we could orphan it we wouldn't even need to put any effort into fixing it. —CodeCa t 22:57, 1 April 2014 (UTC)[reply]

I've created Module:User:CodeCat/isValidPageName as a temporary stopgap measure, and changed {{isValidPageName}} to use it. So {{homophones}} should work again as it did before. —CodeCa t 23:16, 1 April 2014 (UTC)[reply]

Yeah, it's all back to blue :) Thanks.Kolmiel (talk) 21:05, 3 April 2014 (UTC)[reply]

Search exact matches only

With the new update, I've noticed that when I want to go to a page that doesn't exist, instead of giving me the "create this page" message, it sometimes takes me to a page with a similar name. For example if I type "looma" in the search bar and go, it takes me straight to "lööma" instead. Is there a way to prevent that? —CodeCa t 22:22, 2 April 2014 (UTC)[reply]

We need to have exact match function but near matches should also be found. E.g. searching for этимолог (etimolog) should also give results for этимо́лог (etimólog) (with accent) but it doesn't, e.g. etymologist#Translations has an accented Russian translations, which is not picked up in the search results. --Anatoli ^{(обсудить}/^вклад) 23:07, 2 April 2014 (UTC)[reply]

It's a feature. I personally don't particularly like it. But you can get get to the literal page by typing in the URL. --Wiki Tiki 89 23:12, 2 April 2014 (UTC)[reply]

(E/C)Yes, when "looma" is typed "lööma" appears in the search window, just below it, there's another line containing... looma. I never had any problem with this, actually but not being able to find accented terms gives me some grief, "looma" produces "lööma" but "этимолог" doesn't produce "этимо́лог". Arabic/Hebrew accents and hamza, Hindi nuqta terms should be mutually searchable. Hamza doesn't cause any problems but Arabic accents do, e.g. "هٰذا" (accented) doesn't produce "هذا" and "फ़िल्म" (with nuqta) doesn't produce "फिल्म". --Anatoli ^{(обсудить}/^вклад) 23:24, 2 April 2014 (UTC)[reply]

Workaround: in a browser that lets you perform custom searches from the address bar (most of them these days), you can set up various shortcuts to avoid wasting time with the search box. In my setup, "k dog" looks up dog on Wikipedia; "d dog" looks it up on Wiktionary; and "dd dog" starts editing it as a new Wiktionary entry (if not already existing). Equinox ◑ 23:23, 2 April 2014 (UTC)[reply]

Try searching for the string 'category:latin verbs'. Whatever was changed with the most recent update, this is completely unacceptable. Who can we complain to? DTLHS (talk) 18:17, 3 April 2014 (UTC)[reply]

Interestingly, you do land on the search page if there's more than page with diacritics. Thus while coln is a red link, cołn, Cöln, cōln, cōłn, and čoln are all blue, and if you type coln into the search box and hit return, it takes you to "create this page". —Aɴɢʀ (talk) 10:59, 8 April 2014 (UTC)[reply]

I'm the one you can complain to. What you are seeing is Cirrus which recently stopped being a BetaFeature and started being the primary search backend for all wikis other then wikipedias and another handful. This particular functionality has lots of history that I won't bore you with but the upshot is that the old behavior wasn't well documented and in some cases outright crazy and it didn't work properly with the new search backend. So I had to replace it when migrating search. I took a shoot at the right thing to do and it worked for most wikis except English Wiktionary. I did some work with a BetaFeature user on this wiki a few months ago and we ended up on the behavior that you see now. Of course, when we talked about it we weren't talking about the "create this page" links. We were talking about searches like those Angr mentioned above that find multiple terms after accent squashing. Anyway, tell me what you want to do.

One option is to disable accent squashing entirely. You'd be the only English wiki to do this so you'd be weird in that respect but given that your titles are in tons of languages this might make sense. Theoretically this'd make finding "lööma" harder for someone who doesn't think in accents. Disabling per user is much more difficult from a technical standpoint.

Another is to disable accent squashing on near matches. Its a smaller deviation from the rest of the English wikis but it is closer to the old behavior. On the other hand it yields a confusing thing in the search box where you type, wait for the prefix search to complete, see that the first result is the word you are looking for, then hit enter. You'd only arrive on the first result if your word matched with accents. Otherwise you'd end up on the search results page.

I can't think of any other options that don't fall into the "change your workflow" category. Things like instead of hitting enter in the search box you wait for the suggestions to spring back and then hit up, then hit enter. That'll skip the near match searching and go strait to the results page.

Sorry for the long winded reply. I'm happy to design what it should do for you here or in bugzilla:63682 which I've created to track this.

NEverett (WMF) (talk) 16:20, 8 April 2014 (UTC)[reply]

Maybe we could add back the trusty old "Go" and "Search" buttons that don't rely on the search suggestions box popping up? --Wiki Tiki 89 18:09, 8 April 2014 (UTC)[reply]

I'm not certain, but I believe that's a skin issue. I'm using Cologne blue which has the two buttons (and, incidentally, is not affected by the Typography refresh.) - Amgine/^t·e 18:17, 8 April 2014 (UTC)[reply]

Let me rephrase: Maybe we could add them to the Vector skin. --Wiki Tiki 89 18:35, 8 April 2014 (UTC)[reply]

That's a skins question rather than a search question, so that's outside Nik's remit. We can talk about this separate problem later, after we've addressed this one. --Dan Garry, Wikimedia Foundation (talk) 20:43, 8 April 2014 (UTC)[reply]

Well it would solve this issue as well. --Wiki Tiki 89 00:17, 9 April 2014 (UTC)[reply]

I definitely don't want to disable accent squashing on near matches (or at all); it's the best way of finding terms to add to {{also}} templates at the tops of pages. —Aɴɢʀ (talk) 19:29, 8 April 2014 (UTC)[reply]

I think that if the user is directed to a page that is not the same as what the user typed, then it should be considered equivalent to a redirect, and displayed as such. Just blindly sending the user to another page, like now, is confusing and deceiving, and it's even worse when I actually want to go to the nonexistent page. —CodeCa t 19:38, 8 April 2014 (UTC)[reply]

Let me see if I can do this. If the near match mechanism decides to bounce you to a page and the page doesn't exactly match what you typed then you get the same kind of thing you'd get on a redirect. I think for wikis who's titles are forced into title case then we'd force the comparison to title case (but no here).NEverett (WMF) (talk) 20:17, 8 April 2014 (UTC)[reply]

In the past, we had this same kind of silent redirecting behaviour for differences in case too. But even for those, it was occasionally annoying and it would be useful at times to have a way to get to the intended casing. If this is changed so that silent redirects are shown more explicitly like real redirects are, then it would be much appreciated. It probably shouldn't display exactly the same thing though; instead of "redirected from" the text should say something else so that it's clear that the redirect was performed automatically by the software, not by an actual redirect page. —CodeCa t 20:22, 8 April 2014 (UTC)[reply]

Certainly. Do you think the link should be to the non-existent page or to a search for the page? I'm leaning towards a search for the page because that is what you were trying to do when you got there anyway. Also, it has the advantage of dropping you on the page with the helpful page creation buttons in this wiki.NEverett (WMF) (talk) 20:34, 8 April 2014 (UTC)[reply]

@NEverett (WMF) Please don't disable near-match searches as per my first post. We should be able to do both - find the exact match and the near match. The near match should be improved to include searches with/without accent marks (Cyrillic, Arabic, Hebrew, Hindi). As I said, e.g. there is a Russian translation of "etymologist" - "этимолог" but it's written with the stress mark, as it should be in dictionaries and encyclopedias: "этимо́лог". People searching for "этимолог" (without accent) won't find it. --Anatoli ^{(обсудить}/^вклад) 00:36, 9 April 2014 (UTC)[reply]

As for the original request, as I also said, it's not a big issue, (I do it all the time), if you reread my first post. When you type "looma", don't click on "lööma", which appears first but linger a bit and select "containing... looma" below, you get the page Search results for "looma" - Wiktionary and looma appears in red, ready to be clicked on. --Anatoli ^{(обсудить}/^вклад) 00:44, 9 April 2014 (UTC)[reply]

@Atitarev OK. A few days brainstorming the right solution won't hurt, I think/hope. NEverett (WMF) (talk) 14:55, 9 April 2014 (UTC)[reply]

No support for lang code `cejm`?

I was in the process of adding JA term チャリンコ (charinko, “bicycle”), when I discovered that we don't support lang code cejm for the w:Jeju language. A number of Japanese terms could make use of this lang code in calls to {{etyl}}. How hard would it be to add this? Would anyone object to adding this? Or would the proposed ISO 639-3 code jjm be preferable? (That one's also not currently supported.) ‑‑ Eiríkr Útlendi │ Tala við mig 00:27, 3 April 2014 (UTC)[reply]

Etyl: languages don’t need to have a code. You can add it as "Jeju" if you prefer. — Ungoliant ^(falai) 00:36, 3 April 2014 (UTC)[reply]

To expand on Ungoliant's comment: if there won't be Jeju entries (with ==Jeju== L2s), and it'll only be cited in etymologies, then it can be added to the last section ("Other lects") of Module:etymology language/data under any unambiguous (unique) code. You could use "Jeju" as its code, but given that an ISO 639-3 code has been proposed and we are wont to switch from exceptional codes to ISO 639-3 codes whenever the latter are or become available, it might save some work in the long run to just code it as "jjm" now. - -sche (discuss) 03:29, 3 April 2014 (UTC)[reply]

Looks like User:DerekWinters added it in this diff. Thanks, Derek! ‑‑ Eiríkr Útlendi │ Tala við mig 05:08, 3 April 2014 (UTC)[reply]

Japanese term チャリンコ (charinko, “bicycle”) appears to derive from Jeju 자륜거 (jaryun-geo). This is definitely not a standard Korean term, from what I've been able to find. Korean for bicycle is 자전거 (jajeon'geo, “jajeon-geo”), deriving from the same Sinitic term 自轉車 as the Japanese simplified 自転車 (though it might originally be a Meiji-era Japanese coinage from Western contact, c.f. these two links in Japanese, haven't verified though). The Jeju 자륜거 (jaryun-geo) derives instead from 自輪車 (different middle character -- 輪 (“wheel”) instead of 轉 (“revolve”)).

My current understanding is that I should enter {{term|lang=jjm|자륜거|tr=jaryun-geo}}. However, although the jjm lang code now works with {{etyl}}, it doesn't for {{term}}.

Is it kosher to use the ko lang code instead? That doesn't seem quite right, but at least it won't generate any Module Error warnings. Or should I just not link this term at all? That doesn't seem quite right either, given the stated WT mission of all words in all languages. We probably won't have much Jeju content over the short term, but it's conceivable, and ultimately I think desirable, that we could build up a Jeju terms corpus here. ‑‑ Eiríkr Útlendi │ Tala við mig 17:38, 3 April 2014 (UTC)[reply]

For an etymology-only language, you use the language's own code in {{etyl}} and the parent language's code in {{term}}, yes. So, {{etyl|jjm|ja}} {{term|foo|lang=ko}}, as you suspect. (I wonder where that should be documented.)
Is Jeju separate enough from Korean to merit its own entries? WP says "many Koreans, including those who speak Jeju, consider Jeju a dialect of Korean, [but] it can be considered a separate language because it is nearly mutually unintelligible with Korean dialects of the mainland". Does that pertain to the spoken form, or the written form? There is a vote underway to merge Chinese lects that are mutually unintelligible when spoken, but intelligible when written.
If Jeju does merit its own entries, jjm will need to be removed from Module:etymology language/data and a different code will need to be added to Module:languages/datax (until such time as Jeju has an ISO code and can go in a different part of Module:languages). But here we run into a technical problem. The way datax exceptional codes are named is "(the language's family's three-letter ISO 639-5 code)-(three letters representing the language itself)". The Koreanic languages do not seem to have an ISO 639-5 family code. On RFM, we have run into the same problem with Lencan. We could use the second half of exceptional codes we've granted the Lencan and Koreanic families as the family prefixes of the languages' exceptional codes (so Jeju might be "kor-jjm"), but that would be problematic if the ISO ever assigned those strings to different families. We could use "qfa" or "und" as the prefix (so Jeju might be "qfa-jjm" or "und-jjm"), but I'm not sure which of those would be better. Hmm, probably "und". So, "und-jjm"... - -sche (discuss) 18:11, 3 April 2014 (UTC)[reply]

Re: WP says "many Koreans, including those who speak Jeju, consider Jeju a dialect of Korean, [but] it can be considered a separate language because it is nearly mutually unintelligible with Korean dialects of the mainland". Does that pertain to the spoken form, or the written form?

I strongly suspect both. Chinese uses an ideographic writing system, where radically different readings / pronunciations can apply to the same spelling. I can therefore read certain strings of Chinese in a way that only a Japanese speaker would understand. Korean, meanwhile, uses an alphabetic writing system, where the sounds and the glyphs are much more closely tied. As such, I would be surprised if written Jeju is all that much closer to standard Korean than spoken Jeju, or vice versa.

That said, there are two ISO four-letter codes, cejm for Jeju in general, and chjm for spoken Jeju, so perhaps there is substantial variance. Then again, the ISO codes have been a bit odd sometimes, such as including separate codes for varieties of Levantine Arabic that apparently are nearly fully mutually intelligible, while missing separate codes for lects that aren't mutually intelligible (for instance, full-bore Tōhoku-ben in w:Iwate Prefecture left me absolutely befuddled, with different verb endings and different nouns than standard Japanese). Given this inconsistency, I cannot tell if the existence of these two codes necessarily indicates any significant difference between spoken and written Jeju.

Re: If Jeju does merit its own entries, jjm will need to be removed from Module:etymology language/data and a different code will need to be added to Module:languages/datax (until such time as Jeju has an ISO code and can go in a different part of Module:languages).

My understanding from [[w:Jeju language]] is that Jeju already has two ISO codes: cejm and chjm. The three-letter jjm code is currently only proposed (possibly just proposed this year, if the “2014” listed here indicates the year), but the four-letter codes are official, as far as I can tell. For instance, enter “Cheju” for Language Reference Name at http://www.geolang.com/iso639-6/ and you'll get both of these codes.

FWIW, searching http://www-01.sil.org/iso639-3/iso-639-3_Name_Index.tab for “Korean” indicates a three-letter code of kor, leading me to think it would be unlikely for kor to be reassigned to anything other than Korean. ‑‑ Eiríkr Útlendi │ Tala við mig 18:53, 3 April 2014 (UTC)[reply]

ISO doesn't actually matter. Wiktionary doesn't follow ISO, it follows the BCP 47 subtag registry. —CodeCa t 19:31, 3 April 2014 (UTC)[reply]

@Eirikr: I mean a three-letter ISO code; we never use four-letter language codes (you may have noticed). And the "kor" you find is for the Korean language, not the family.
@CodeCat: Even within the past couple of months, we've incorporated new three-letter ISO codes without anyone checking the BCP registry. So, following the BCP registry may be a goal some users periodically remember to check if we're meeting, but it's not wrong to observe that in day-to-day practice, the ISO is what gets consulted. - -sche (discuss) 19:58, 3 April 2014 (UTC)[reply]

In light of the above then, it sounds like the best course would be to add und-jjm to Module:languages/datax for the time being. Do I understand that correctly? Or, do we seek more input on whether Jeju is different enough to merit a code? ‑‑ Eiríkr Útlendi │ Tala við mig 23:31, 3 April 2014 (UTC)[reply]
- We normally create new codes by adding something made up to an existing family code. If the family code itself doesn't exist, the same process is applied. We use fiu-fin-pro for Proto-Finnic for example. —CodeCa t 00:09, 4 April 2014 (UTC)[reply]

Whether this was by design or merely happened this way, proto-languages fit a slightly different naming scheme than other languages: "-pro" is added to the entire code of the family the proto-language is the ancestor of, whether that code is three characters (e.g. an ISO code like "gem" : "gem-pro", "Proto-Germanic") or seven characters (e.g. the example you give, "fiu-fin" : "fiu-fin-pro", "Proto-Finnic"). I'm not aware of a non-proto language that does that, i.e. that uses a code of more than seven characters. It is an interesting idea, though — applying the proto-languages' naming scheme to non-proto-languages to get 'qfa-kor-jjm', as opposed to 'und-jjm'. And although I don't expect we'd ever reach the limit of codes that would be possible under a 'und-xxx' naming scheme, the 'qfa-yyy-xxx' scheme would give us the maximum flexibility to select characters for the last three places (the '-xxx') that clearly represented the language's name. (By which I mean, if every exceptional code for a language whose family lacked an ISO code were prefixed with 'und-', and we needed to grant codes to a Lencan Foobaar language, a Koreanic Foobahr language and a Keresan Foobr language, we'd have to encode them in more creative and thus potentially less memorable ways like 'und-fob', 'und-fbh' and 'und-fbr', whereas under the 'qfa-xxx' scheme each could be 'qfa-(whatever)-foo' or such.) So I suppose 'qfa-kor-jjm' if a better idea than 'und-jjm'... - -sche (discuss) 22:34, 4 April 2014 (UTC)[reply]
- I object to "und-jjm" mainly because "und" is not a family code. However, we could decide to assign family names from the private use area. That would avoid excessively long names, and Finnic could become just "qfi", Koreanic could get "qko". We could abandon the prefix "qfa" too if we decide to, replacing the few families that use it with something else. Doing the same for languages too might not be so good because there are too many of them to fit into the private use area. There's also w:List of ISO 639-3 language codes reserved for local use, which shows that many private use codes have been used by Linguist List. We could adopt their codes, or ignore them if we want to. —CodeCa t 23:10, 4 April 2014 (UTC)[reply]

I went ahead and added qfa-kor-jjm to Module:languages/datax. Please whack me with the cluebat if that was in error. ‑‑ Eiríkr Útlendi │ Tala við mig 17:24, 7 April 2014 (UTC)[reply]
- An error, no. But premature, given the alternative I suggested, yes. So... *gives slight nudges with bat*. —CodeCat 17:29, 7 April 2014 (UTC)[reply]
  - Apologies, I didn't understand that you were proposing an alternative. I also saw qfa-kor-pro in the list, leading me to think that the qfa-kor- prefix was already accepted. But to be honest, I didn't go through the history to see when this was added and who added it. ...Though now that I do, I see it showed up in diff back in November, apparently being moved over from Module:languages/alldata, where this code appears in the first version of the page in diff.
  Somewhat confused, ‑‑ Eiríkr Útlendi │ Tala við mig 17:51, 7 April 2014 (UTC)[reply]
  I meant the alternative right above. Using private-use codes for families lacking a code. —CodeCa t 18:14, 7 April 2014 (UTC)[reply]
  On a balance, I think "qfa-kor-jjm" is better than "qko-jjm". It's good for codes like Jeju's to be formed in the same way as proto-languages' codes, rather than in a new way — it's good to avoid proliferation of naming schemes. And forming proto-languages' codes by adding "-pro" to the family is a clear, straightforward scheme, as opposed to having the family be "qfa-kor" but the proto-language be "qko-pro". And changing the family code to "qko", i.e. switching from three-selectable-character family codes (qfa-___) to two-selectable-character codes (q__) would be unwise, IMO, if not infeasible. There are so many families and subfamilies of those families which the ISO has not granted codes to that we would quickly run out of combinations of letters which memorably/intelligibly represented those families' names, if we were limited to two-[selectable-]letter codes. (Module:families/data already includes 40 ISO-code-less families and their subfamilies, and may one day — as it becomes more complete and up-to-date — include at least four times that many, in my estimation.) - -sche (discuss) 22:16, 7 April 2014 (UTC)[reply]
  I think codes made out of three parts are too long. And we have 520 possible private use codes available, so I'm sure we can fit all the families we need into that. I'm not sure what you mean about proto-language codes. If we change the family codes, they would change too. So we would have qfi-pro for Proto-Finnic, qko-pro for Proto-Korean, qbs-pro for Proto-Balto-Slavic and so on. I really prefer that to the codes we have now. —CodeCa t 23:22, 7 April 2014 (UTC)[reply]

Translation boxes only for English terms?

For example, I can see in the word connive a translation box, not though in the Italian word amare. Is there a rule where translation boxes with multiple language translations are added only to English terms? --Spiros71 (talk) 09:24, 3 April 2014 (UTC)[reply]

Yes, only English terms have translation sections. Foreign-language terms are translated only to English. —Stephen ^(Talk) 09:33, 3 April 2014 (UTC)[reply]

Some translingual entries have translation sections as well. — Ungoliant ^(falai) 10:56, 3 April 2014 (UTC)[reply]

Where is the part of speech/type tag list page ?

Hi, I look for the complete list of part of speech/types tags: ====Noun====, adverb, initialism, etc. I didn't find it in Special pages. Thanks ! — This unsigned comment was added by 201.212.5.12 (talk).

WT:Entry layout explained/POS headers#Headers in use. — Keφr 20:17, 4 April 2014 (UTC)[reply]

That page is a bit outdated though. —CodeCa t 20:55, 4 April 2014 (UTC)[reply]

Taming topic cat

Now that we have Lua, can we get rid of {{topic cat}}'s parameters? It should be a simple matter of parsing the page name to determine if it's a well-formed topical category name with a valid language code followed by a colon followed by something that could be a topic, and, if applicable, followed by a valid script name, then feeding the parts to a back end that could even be the unchanged code from the current topic cat. Later on, we should despaghettify the backend- but this seems like it could be implemented in an hour or two by anyone who knows what they're doing. The nice part is that it won't need any parameters, so it can just ignore those in all the current uses- no botting required.

I've been doing a lot of category stuff lately, and it's really annoying typing in all the necessary stuff, only to have it scream at you in ugly red because you didn't precisely match the category name in the precise format required.

The other side of the coin would be taken care of by an accelerated category adder that would know the language code for the section it was in and let you choose from a menu of existing categories- but that's for later. Chuck Entz (talk) 05:27, 5 April 2014 (UTC)[reply]

I was working on a replacement for {{catboiler}} (which powers {{poscatboiler}} and such) a while ago, see Module:User:CodeCat/category boilerplate. It was sort of finished and it's actually used in a few categories already, but it didn't have all the functionality that was necessary to completely replace it yet. Presumably, we would want to use this same module to handle {{topic cat}}, to avoid the proliferation of different pieces of code that all do more or less the same thing. —CodeCa t 13:05, 5 April 2014 (UTC)[reply]

Something strange my bot did

diff. It should have added "pt" instead. I have checked my code several times and I still have no idea why it did that. So I don't really know how to fix it, but at least I'm reporting it so that it's known. —CodeCa t 23:18, 5 April 2014 (UTC)[reply]

Well if you're expecting us to help fix it, you'll have to show the code. The most recent L2 before that line is most certainly ==Portuguese== and not ==Old Norse==. --Wiki Tiki 89 23:21, 5 April 2014 (UTC)[reply]

That's what's confusing me... —CodeCa t 23:50, 5 April 2014 (UTC)[reply]

Ok, here is the code for the part that does the page edit. I'm using the pywikipedia framework and the mwparserfromhell library.

if page.namespace() == 0:
	for langsection in text.get_sections([2]):
		name = unicode(langsection.get(0).title)
		code = None
		
		for template in langsection.filter_templates():
			if template.name == "audio" and not template.has("lang", False):
				code = code if code else blib.get_language_code(name)  # this translates the name to its code using [[Module:languages]]
				template.add("lang", code)

—CodeCa t 00:09, 6 April 2014 (UTC)[reply]

So text.get_sections() is a mwparserfromhell thing right? In that case the bug is probably in their code not yours. --Wiki Tiki 89 00:12, 6 April 2014 (UTC)[reply]

It does seem so. I removed all the excess code and it looks like it's grouping the Old Norse, Old Portguese and Portuguese sections as one. So it's a bug in their parser most likely. I've reported it now. —CodeCa t 00:30, 6 April 2014 (UTC)[reply]

They replied and said that it's caused by incomplete markup on the page. In this case it's a '' that isn't closed properly. So it's part of a larger problem, and they're working on fixing it. —CodeCa t 12:07, 6 April 2014 (UTC)[reply]

Well I use manual searching for L2 header so I won't have such problem. --kc_kennylau (talk) 12:12, 6 April 2014 (UTC)[reply]

You will if you ever come across a commented-out L2. --Wiki Tiki 89 17:32, 6 April 2014 (UTC)[reply]

/^==([^=])==/ won't find a commented-out L2. - Amgine/^t·e 22:09, 6 April 2014 (UTC)[reply]

The comment doesn't have to be on the same line. --Wiki Tiki 89 22:17, 6 April 2014 (UTC)[reply]

Ah, true. Hadn't considered that. - Amgine/^t·e 22:20, 6 April 2014 (UTC)[reply]

Yes, let's all roll our own wiki markup parsers, this surely isn't wasted effort. DTLHS (talk) 22:15, 6 April 2014 (UTC)[reply]

<curious look> Why wouldn't we? It's hardly as much time wasted as trying to debug someone else's code, since WMF still refuses to release an index parser. - Amgine/^t·e 22:20, 6 April 2014 (UTC)[reply]

Because it's better to have a common set of tools that behaves in a known way, that is tested by many people. Because it's much less of a barrier for newcomers to overcome if they don't have to write a template parser if they want to do anything interesting. DTLHS (talk) 02:42, 7 April 2014 (UTC)[reply]

Something like

// e.g. $templateAndArguments = {{audio|En-us-Bhojpuri.ogg|Bhojpuri|lang=en}}
$parsedTemplate = file_get_contents( urlencode( "https://en.wiktionary.org/api.php?action=expandtemplates&text=$templateAndArguments" ) );

or were you talking about some other, more complicated re-invention of the wheel? did you really suggest pywikipediabot is a newb-oriented software? (Not everyone writes python. Or should.) - Amgine/^t·e 04:15, 7 April 2014 (UTC)[reply]

mw:API:Parse with the generatexml option satisfies most of my parsing needs. There are some limitations, though. — Keφr 05:46, 7 April 2014 (UTC)[reply]

Automatic links to words in multiple-word entries?

Is it possible to include some lines in Module:headword so that I don't need to do this every time I see some multiple-word entries? --kc_kennylau (talk) 01:24, 6 April 2014 (UTC)[reply]

Not everything with a space is composed of two words that actually exist. DTLHS (talk) 01:25, 6 April 2014 (UTC)[reply]

And furthermore, complicating matters, there are some strings of words like "A B C" which are composed of (and which we prefer to link as) "A B" + "C", not "A" + "B" + "C". - -sche (discuss) 01:31, 6 April 2014 (UTC)[reply]

It could be made to default to the common case, with an explicit "head=" statement overriding the default. DCDuring TALK 01:37, 6 April 2014 (UTC)[reply]

Or we could even have the default be no auto-linking, but allow a special head=+ case to enable auto-linking. --Wiki Tiki 89 07:11, 6 April 2014 (UTC)[reply]

What if the head really is +? --kc_kennylau (talk) 07:40, 6 April 2014 (UTC)[reply]

You can write it as +. Though personally, I really dislike inventing ad-hoc special-case syntax. It is a first step towards a large unmaintainable mess. — Keφr 07:55, 6 April 2014 (UTC)[reply]

If we are going to have automated linking, making linking the default which head suppresses or alters (as DCDuring suggests) seems preferable to making it so that users turn on linking by filling in head= with something other than the linking they want. (If you're filling in head=, just go ahead and fill in what head equals.) - -sche (discuss) 08:45, 6 April 2014 (UTC)[reply]

+1 —Ruakh_TALK 19:37, 6 April 2014 (UTC)[reply]

But it should be ignored if someone literally writes head= with no value, right? —CodeCa t 19:47, 6 April 2014 (UTC)[reply]

I want the behavior described by DCDuring. Will make editing easier. --Vahag (talk) 08:47, 6 April 2014 (UTC)[reply]

I like this. It would definitely be useful in cases like yêu nhiều thì ốm, ôm nhiều thì yếu. Wyang (talk) 07:03, 7 April 2014 (UTC)[reply]

I've thought about implementing something like this, but there are some problems. It's not always desirable to split the terms. For example, it's not so useful to link to the parts of an inflected form of a multi-word verb, like for example gave up. There are probably other cases where splitting doesn't make sense either. —CodeCa t 14:24, 10 April 2014 (UTC)[reply]

In the case of gave up, I think it is useful to link to the parts. --Wiki Tiki 89 15:12, 10 April 2014 (UTC)[reply]

I support the feature, even if the links may be to non-lemma forms and many languages don't have inflected form entries in Wiktionary. "[[gave]] [[up]]" can be converted to "[[give|gave]] [[up]]" manually. It's easier than inserting each pair of square bracket manually. --Anatoli ^{(обсудить}/^вклад) 23:27, 10 April 2014 (UTC)[reply]

@-sche, Atitarev, CodeCat, Wikitiki89, SemperBlotto, Ruakh, Vahagn Petrosyan, Wyang, DTLHS well. --kc_kennylau (talk) 18:26, 19 April 2014 (UTC)[reply]

I've made the change in Module:headword. It only splits on spaces, just to be safe for now, and it only applies it if no headword was already provided. This means that it will explicitly not work with any template that gives something like |head={{{head|{{PAGENAME}}}}}, because the {{PAGENAME}} will override the default. —CodeCa t 14:14, 23 April 2014 (UTC)[reply]

For something like give up the ghost, I think we would like to link it as give up the ghost. I don't think we want to link all the "of" and "the" instances in headers, although I imagine those could be excluded by default. bd2412 T 15:14, 23 April 2014 (UTC)[reply]

So, in cases such as that, the linking should be specified (not left to default). SemperBlotto (talk) 15:17, 23 April 2014 (UTC)[reply]

Would we have to fix all existing headwords containing common prepositions or definite articles to specify this? I think it would be easier from a maintenance standpoint (thought obviously not from a programming standpoint) to exclude such terms in the first place. bd2412 T 15:20, 23 April 2014 (UTC)[reply]

You're looking at it from the standpoint of someone who already has a good command of the language, and knows such common words already. But it's very conceivable that someone who has no knowledge of Italian will wonder what il means and a link to it would certainly be helpful to them. —CodeCa t 15:44, 23 April 2014 (UTC)[reply]

Purely from an aesthetic standpoint, it looks really ugly to me when small words are not linked, making the headword look like like it's not all one piece. give up the ghost looks much better as a headword than give up the ghost. --Wiki Tiki 89 15:54, 23 April 2014 (UTC)[reply]

You need to consider punctuation (see yêu nhiều thì ốm, ôm nhiều thì yếu as mentioned above) DTLHS (talk) 15:49, 23 April 2014 (UTC)[reply]

Could someone give a list of cross-language punctuation? Also, I'm not sure how to split the words while preserving the punctuation. Scribunto doesn't seem to provide a function for that; it always throws the punctuation away when it splits. —CodeCa t 16:12, 23 April 2014 (UTC)[reply]

Don't we already have a list of punctuation in one of the linking modules (I don't remember which one). DTLHS (talk) 16:14, 23 April 2014 (UTC)[reply]

Module:languages#Language:makeEntryName is probably what you mean, but that one only removes some punctuation. It doesn't include periods, commas, hyphens, colons etc. Colons and hyphens in particular should not necessarily be unlinked. In Finnish, the colon is used to separate abbreviations from their case ending, and the hyphen is used for the same purpose in Slovene. Dutch uses the apostrophe for that purpose. It's very hard to come up with cross-linguistic rules. —CodeCa t 16:22, 23 April 2014 (UTC)[reply]

In that case I guess you should just create a tracking category and review everything in it. DTLHS (talk) 16:23, 23 April 2014 (UTC)[reply]

name = "testing, testing, and testing."
name = mw.text.split(name,' ')
for i, word in ipairs(name) do
	last = ""
	if mw.ustring.match(word,'[,.-_?!\'"()%[%]{}@*#$%%^&]$') then
		len = mw.ustring.len(word) --not using #word because #word counts bytes
		last = mw.ustring.sub(word,len,len)
		word = mw.ustring.sub(word,1,len-1)
	end
	name[i] = "[[" .. word .. "]]" .. last
end
name = table.concat(name,' ')

(not tested, punctuation list not complete, this is just a demonstration) --kc_kennylau (talk) 16:46, 23 April 2014 (UTC)[reply]

Placeholder words like one, one's, oneself, someone/somebody, and possibly 's should either link to an appendix on the use we make of placeholders or to definitions in the entries for the words written with a view toward this kind of use.

It would also be useful if links to MWEs had some kind of faint underlining to convey that the entire MWE rather than the constituent terms were the object of the link. DCDuring TALK 20:25, 23 April 2014 (UTC)[reply]

Something like that would be very complex to do for every single language, so I don't think it's a good idea. And I'm actually surprised that you're suggesting it, as you normally seem opposed to complexity. —CodeCa t 20:28, 23 April 2014 (UTC)[reply]

There needs to be a way to explicitly disable this without retyping the whole headword. --Wiki Tiki 89 21:10, 23 April 2014 (UTC)[reply]

@CodeCat Are we going to ignore the punctuation marks or not? --kc_kennylau (talk) 09:12, 26 April 2014 (UTC)[reply]

Can we import pronunciations?

As someone who started learning French without knowing anything about the pronunciation, I find myself having to go to fr.wikt pretty often for pronunciation information. Is it possible to import these pronunciations to en.wikt by bot? I imagine it could be helpful as fr.wikt is pretty good about having them for French words, and other wiktionaries are as well (I think de.wikt, for example). Ultimateria (talk) 03:45, 7 April 2014 (UTC)[reply]

I think we might have to keep the attribution, which could be difficult. Perhaps the bot could sift through the history and add a link in the edit summary to the original editor who added the pronunciation... --Yair rand (talk) 03:48, 7 April 2014 (UTC)[reply]

The copyright notice when you submit an edit says, "You agree that a hyperlink or URL is sufficient attribution". I guess that's a bit vague now that I think about it, but I've always taken it to mean, a link to the page that's being copied from, whose edit-history then identifies you. (I think the only reason that Transwikis import the edit history is that it's expected that the source-page might then be deleted, which would destroy the edit-history and therefore the attribution.) (Actually, I guess you're probably talking about plagiarism rather than copyright, since the pronunciation information itself is not copyrightable, and we would not be copying the expression of it; but there as well, I think that linking to the source-page is sufficient attribution.) —Ruakh_TALK 05:38, 7 April 2014 (UTC)[reply]

Tbot (talk • contribs) used to do this, and mindlessly imported pronunciations from the Korean Wiktionary (Category:Tbot entries (Korean)), which uses a different set of IPA symbol conventions from the Korean editors here. There are still hundreds of pages in that category uncleaned-up. Wyang (talk) 03:54, 7 April 2014 (UTC)[reply]

We could use the audio-file link, though, where-ever it exists. Not sure, if that's what the original question meant. --Anatoli ^{(обсудить}/^вклад) 04:13, 7 April 2014 (UTC)[reply]

But, couldn't we do it....not blindly? I mean, couldn't we analyze the French Wiktionary's IPA standards, and see if they are compatible with ours, and then proceed if everything's kosher? It seems like the addition of large amounts of content would be worth examining, right? Yair's thoughts on attribution are important to consider, but, as they state, probably surmountable. -Atelaes λάλει ἐμοί 05:08, 7 April 2014 (UTC)[reply]

Agreed. —Ruakh_TALK 05:38, 7 April 2014 (UTC)[reply]

I think DerbethBot already imports all audio files. --Yair rand (talk) 05:25, 7 April 2014 (UTC)[reply]

Make module text output call template

Hi all. Is it possible to let the text output '{{temp|aaaa}}: {{temp|aaaaa}}' from a module be displayed as '{{aaaa}}: {{aaaaa}}' rather than just a string '{{temp|aaaa}}: {{temp|aaaaa}}'? Thanks in advance, Wyang (talk) 06:27, 7 April 2014 (UTC)[reply]

Short answer: no. Templates cannot be called from the output of a Module. --Wiki Tiki 89 06:33, 7 April 2014 (UTC)[reply]

But Module:it-conj can build a table? --kc_kennylau (talk) 06:47, 7 April 2014 (UTC)[reply]

A table is not a template, it's just special wiki syntax. --Wiki Tiki 89 06:49, 7 April 2014 (UTC)[reply]

OK... I have changed the code (Module:vi-pron) to avoid this. Thanks. Wyang (talk) 07:03, 7 April 2014 (UTC)[reply]

@Wyang{{#invoke:User:kc_kennylau/sandbox|temp}}--kc_kennylau (talk) 08:48, 7 April 2014 (UTC)[reply]

What about other templates, such as '{{ko-conj-adj|stem1=노랗|stem2=노래|stem3=노라|haet=노랬|hal=노랄|ham=노람|han=노란|stem1_r=nora|stem1a_r=norat|stem2_r=norae|cstem=ㅎ}}'? Wyang (talk) 12:22, 7 April 2014 (UTC)[reply]

Templates can be called inside modules. Look up the "expandTemplate" function in the Scribunto documentation. —CodeCa t 12:57, 7 April 2014 (UTC)[reply]

Just to point out, I didn't say they can't. I said they cannot be called in the output of a Module. They need to be expanded within the module but it is not such a good idea and should be avoided if possible. --Wiki Tiki 89 16:04, 7 April 2014 (UTC)[reply]

But what is the "output" of a module? To me, it's the text that the module returns as its result. So it's not something anything can be "in". —CodeCa t 16:07, 7 April 2014 (UTC)[reply]

The text is "in" it, with wiki syntax and everything. People are used to calling templates in renderable text and often don't know the difference between templates and wiki syntax. --Wiki Tiki 89 16:09, 7 April 2014 (UTC)[reply]

I think it applies to anything with { } in it. So that includes templates, magic words ({{PAGENAME}} and such) and parser functions like #if and #invoke. —CodeCa t 16:16, 7 April 2014 (UTC)[reply]

Those are essentially templates. Wiki syntax is also tables, markup, etc. --Wiki Tiki 89 16:36, 7 April 2014 (UTC)[reply]

@Wikitiki89 Well you could just place in the content of the template just like what I did. --kc_kennylau (talk) 18:28, 19 April 2014 (UTC)[reply]

But then it's not a template anymore. --Wiki Tiki 89 18:43, 19 April 2014 (UTC)[reply]

How do I build testing modules?

Module:User:kc_kennylau/sandbox? Is this allowed? --kc_kennylau (talk) 07:00, 7 April 2014 (UTC)[reply]

Sure. All the same rules that apply to other Username-space content would apply, namely that it has something vaguely to do with building a dictionary, and that you're not using it as a free alternative to MySpace. -Atelaes λάλει ἐμοί 07:14, 7 April 2014 (UTC)[reply]

Thank goodness. My free alternative to Twitter is safe. — Keφr 08:32, 7 April 2014 (UTC)[reply]

Template:ttbccatboiler needs minor fix

This edit: (diff) is no doubt an improvement, but it left things kind of messy (see Category:Translations to be checked (German)). Chuck Entz (talk) 14:07, 7 April 2014 (UTC)[reply]

Lua replacement for Template:langrev running out of memory

I tried replacing calls to {{langrev}} in {{ttbc}} and {{trreq}} to use the Lua equivalent, which is Module:languages/templates#getLanguageByCanonicalName, which is a wrapper around Module:languages#getLanguageByCanonicalName. Unfortunately when I did that, many pages using these templates starting showing server errors. Apparently Lua ran out of memory so whoever wrote it thought it would be best to just let the whole page crash. Not so good.

In any case, that means that we're not able to replace this template with its Lua equivalent just yet. We need to find a better way to do it. Does anyone have suggestions? —CodeCa t 17:18, 7 April 2014 (UTC)[reply]

Create a module which computes a reverse-lookup table, mw.loadData it, and index that. That is, do what I did at Module:User:Kephir/test1 and Module:User:Kephir/test2. This might be lighter on memory. Just a conjecture, though. — Keφr 16:28, 8 April 2014 (UTC)[reply]

What would the difference be? --kc_kennylau (talk) 13:22, 10 June 2014 (UTC)[reply]

The difference between what? — Keφr 13:28, 10 June 2014 (UTC)[reply]

Between calling a module which calls the table and calling the table directly. --kc_kennylau (talk) 13:34, 10 June 2014 (UTC)[reply]

For regular invocations and modules loaded using require, module code is executed once per invocation. However, modules loaded with mw.loadData (docs) are executed only once per page rendering cycle, and the results are cached for subsequent invocations. Saving us some time. However, I noticed some overhead when accessing the generated data, which might be noticeable in large scale deployment. — Keφr 13:49, 10 June 2014 (UTC)[reply]

Editing subpages

The [edit] button at the top of each section in the tea room seems to have disappeared - now if I want to make a comment I have to go to the monthly subpage and edit there instead. But this is only happening in the tea room - not in the grease pit, the beer parlour, or anywhere else that I know of. What's going on? —Mr. Granger (talk • contribs) 21:38, 7 April 2014 (UTC)[reply]

That must have something to do with the permissions, which I changed recently. Until we find a way around it, I'll revert it back. --Wiki Tiki 89 21:42, 7 April 2014 (UTC)[reply]

Does anyone use CSVLoader

I had a problem that its current version forces capitalization, I thought I would simply ask for the earlier version from its creator w:User:Ganeshk but I doubt he will even see my message as he is apparently on a break and his talk page is being spammed by the signpost thing and auto-archived (couple more days and in the trash my message goes...)

Do any of you guys happen to have a version of CSVLoader that works on Wiktionary? Neitrāls vārds (talk) 06:53, 8 April 2014 (UTC)[reply]

A new type of collapsible content

I've recently thrown together a little javascript snippet to do some collapsing on {{grc-pron}}, which can be seen at User:Atelaes/viewSwitching.js. I was wondering what folks would think about adding it to our Common.js. Essentially, what it does is switch between two different representations of the pronunciation of an Ancient Greek word, one which is more compact, taking up only a single line, and one which takes up more space, but has more detail. My goal was to try and reduce the amount of pre-definition space, while retaining our current level of detail for those who want it. It, of course, integrates with Conrad's hiding infrastructure. I also tried to make the javascript fairly general, such that other templates could make use of it. Any feedback is appreciated. Thanks. -Atelaes λάλει ἐμοί 21:34, 8 April 2014 (UTC)[reply]

I don't think your addition is bad as such, but there was recently some discussion about migrating away from our in-house collapsing code and using the built-in MediaWiki code instead. I do think that's something we should look into, so that would affect your code as well. —CodeCa t 21:38, 8 April 2014 (UTC)[reply]

I only half-tried, but I wasn't able to find to find any detailed documentation of the MW collapsing content. Could it do what my code does? -Atelaes λάλει ἐμοί 21:44, 8 April 2014 (UTC)[reply]

I haven't really worked with it at all, so I don't know. All I know is that, for collapsible tables, it collapses the table itself instead of the surrounding div. That alone is a big advantage, I think. —CodeCa t 21:46, 8 April 2014 (UTC)[reply]

Ok, after looking at the manual, and the source code, it doesn't look like Mediawiki's built-in can do switching, only hiding and unhiding, so I'm going to persist in championing my code. Additionally, I wonder if the built-in code is really ideal for our purposes at all. As so often happens, the code is really built with Wikipedia in mind, not Wiktionary (something I can't fault them for, they are certainly more important than we are). Specifically, it's slow and not centrally controllable, unlike ours. That's fine if you want to open one of the two hidden tables at the end of a Wikipedia article, but not fine if you want to blow up twenty consecutive inflection or translation tables, something I find myself doing from time to time. It's also not fine if there's a specific class of content you always want shown or hidden, which is a genuine use-case on our project. That being said, if we decide not to use the built-in's here, there may well be improvements we could make to our home-brew stuff. -Atelaes λάλει ἐμοί 22:41, 8 April 2014 (UTC)[reply]

Definitely, yes. Not having to wrap tables in divs would be a good start. —CodeCa t 22:51, 8 April 2014 (UTC)[reply]

We can use templates for tracking things too

We've been using tracking categories so far, which are quite useful. But they are awkward to use from within modules. I tried out something else instead: using templates for tracking. Modules are able to transclude templates, but they're not necessarily required to use the output of that template in any way. However, the act of transcluding, in itself, causes the page to appear as a transclusion for that template. So this can be used in much the same way as tracking categories are. So instead of creating categories and adding entries to them, you create empty templates and transclude them. We don't have any proper system for this yet but I propose we create Template:tracking and use subtemplates of that, as necessary. —CodeCa t 14:21, 9 April 2014 (UTC)[reply]

I fail to see the advantage over tracking categories. --Wiki Tiki 89 14:23, 9 April 2014 (UTC)[reply]

Like I said, they can be used from modules, which is much harder for categories. I'm also not proposing that we choose one or the other, it's more like I'm making it known that this alternative method exists, and we can use it too when necessary. —CodeCa t 14:24, 9 April 2014 (UTC)[reply]

Oh I see what you mean. You have to output categories, but not templates. That seems like a bad workaround for something that we should just ask the developers to develop. --Wiki Tiki 89 14:30, 9 April 2014 (UTC)[reply]

Yes, exactly. The nice part about template transclusion is that it works "outside" the invoke/result system, so it doesn't disrupt the normal functioning of a module or template. Tracking categories and tracking templates are both workarounds of course, but they work, so we might as well use them. —CodeCa t 14:34, 9 April 2014 (UTC)[reply]

I don't think a tracking category is workaround; I think categories are exactly what should be tracking things. Maybe we can get the developers to add a way to include categories without outputting anything. For now though, I see no problem with using template transclusions as a workaround. --Wiki Tiki 89 14:38, 9 April 2014 (UTC)[reply]

I suppose that's true. But then it could be argued that the "what links here" feature should actually be a category. Furthermore, categories are distinguished in that they are actual wiki pages and can have content other than the entries listed there. I don't think categories were ever intended to be used in the way we use them when Wikipedia was first made. —CodeCa t 14:47, 9 April 2014 (UTC)[reply]

It doesn't matter what was intended. What matters is does it make sense to use categories for that? And I think it does make sense. --Wiki Tiki 89 15:11, 9 April 2014 (UTC)[reply]

One could argue that because of the above-mentioned impracticality, it does not. — Keφr 15:21, 9 April 2014 (UTC)[reply]

Well I mean it conceptually makes sense, which is why we've been using tracking categories since well before Lua came around. Using template transclusions to track things makes much less conceptual sense, but as you point out it currently makes more practical sense. Although, conceptual sense is probably even more subjective, so feel free to disagree. --Wiki Tiki 89 15:27, 9 April 2014 (UTC)[reply]

Can you make an example of this? — Ungoliant ^(falai) 14:29, 9 April 2014 (UTC)[reply]

Look at what I did in Module:languages/templates. Something like that would be impossible to do with categories. —CodeCa t 14:30, 9 April 2014 (UTC)[reply]

Looks good. I support the proposition. — Ungoliant ^(falai) 14:39, 9 April 2014 (UTC)[reply]

But categories should be preferred, as they are easier to navigate and to find than what-links-here pages. — Ungoliant ^(falai) 16:45, 9 April 2014 (UTC)[reply]

@CodeCat There is a flaw in this plan. You tried to use the frame object without knowing what it is. Quote from mw:Lua reference manual#Frame object: The frame object is the interface to the parameters passed to {{#invoke:}}, and to the parser. Thus, it can not just be used from any module, it needs to have the frame passed to it from the originally invoked module. --Wiki Tiki 89 16:25, 19 April 2014 (UTC)[reply]

But what about mw.getCurrentFrame? —CodeCa t 16:38, 19 April 2014 (UTC)[reply]

Ok, I guess you can use that too. But in the situation below, that was not done. --Wiki Tiki 89 16:42, 19 April 2014 (UTC)[reply]

What should Wikimedia link templates do with "invalid" language codes?

There are a number of pages showing errors right now, because they use a Wikimedia link template like {{wikipedia}} with a language code that Wiktionary doesn't recognise. For example, hr (Croatian, considered part of sh here), simple (Simple English, considered part of en) and so on. We certainly do want to be able to link to these Wikipedias, but their non-standard (from our perspective) codes are a bit of a problem. How would this best be solved? —CodeCa t 13:39, 11 April 2014 (UTC)[reply]

Why not an explicit list of exceptions for use in the kind of templates where such use is known to be valid? Occasional processing of the dumps could find new exception-template combinations, because it doesn't seem worthwhile to waste more categories on it, though it might be.

I never thought that we could have a perfectly accurate and complete list of language codes for all conceivable purposes, especially as the variable used by such codes seem to tempt folks to use it for other purposes, to which temptation some folks inevitably will succumb.

Would this even be a problem if our modules weren't so designed to throw ugly error indications at the failure of a variable to be on a defined, finite, but very large list? DCDuring TALK 14:00, 11 April 2014 (UTC)[reply]

I suppose we could make something along the lines of {{wikimedia language}}, but in reverse. —CodeCa t 14:31, 11 April 2014 (UTC)[reply]

Wikimedia project language codes aren't the same as our language codes, so they shouldn't be processed by the same code that processes our language codes. Why even bother checking against our language data modules, since there are so many of our codes that don't have projects, and several projects with codes that don't match ours? We should check the language codes in Wikimedia link templates against lists of valid Wikimedia project language codes, after first converting the WT language codes that are different into their WM equivalents. Chuck Entz (talk) 21:49, 12 April 2014 (UTC)[reply]

The problem is that some templates like {{wikipedia}} apply language-specific formatting to the text. For example, the entry Загреб in Serbo-Croatian needs to tag the actual name of the article in the box as Serbo-Croatian and apply Cyrillic styling to it. Language/script tagging goes via {{lang}}, which uses our usual script detection routine to figure out the script based on the given text. And that, in turn, requires looking up the language and knowing what scripts it uses. That's where it fails. —CodeCa t 22:03, 12 April 2014 (UTC)[reply]

And why exactly has this suddenly started to be a problem? Chuck Entz (talk) 23:15, 12 April 2014 (UTC)[reply]

Because the supporting templates, like {{lang}}, were converted to use Lua. That doesn't mean there wasn't a problem before, of course. It just means that now it's more obvious that there is one. Things changed specifically with this edit, which switched from {{script helper}} to {{lang}}. {{script helper}} has no Lua support, it just outputs whatever language and script you give to it (it's what all our script code templates use underlyingly). However, you can see that there was also some Lua in the old version: it retrieved the first script of the given language code. It only did that if there wasn't already a script specified, so an error was avoided back then by specifying the script in the entry, like at Загреб. So that means that linking with the code "sr" or "hr" with no sc= parameter would have triggered an error anyway, which is hardly proper behaviour for the template. The proper solution here could be for the template (and any like it) to recognise that "sr" is not a valid code on Wiktionary, and convert it to one that is valid before it's given to {{lang}}.

To confuse the matter, though, there's already a conversion step for the language code in that template, using {{wikimedia language}}. That template does the exact opposite: it takes a Wiktionary code and translates it into a Wikimedia code, like for example nan > zh-min-nan or nb > no. This conversion step is also used for other external linking templates, like the interwiki links in our translation template {{t+}}. That means that, in principle, the lang= parameter on {{wikipedia}} and {{t+}} specifies the Wiktionary-internal code, and not the Wikimedia code, and this lets us write {{t+|nb|word}} and generate word (no) with the link "corrected" to point to no.wiktionary instead.

Such a translation step works fine if you can uniquely determine the Wikimedia code from the Wiktionary code. But it fails when there's more than one wiki in what we consider the same language, like for English/simple or for the three varieties of Serbo-Croatian. {{wikipedia}} isn't the only template with this problem. {{t+}} also has the same issue; it's currently impossible to link to the Bosnian, Serbian or Croatian Wiktionaries using {{t+}}. {{t+|sh|something}} only links to the Serbo-Croatian Wiktionary: something (sh).

So really, the issue is not in the changes I've made. They've only exposed a deeper flaw in the thought process that has gone into these templates and how they're meant to work. In the end, we need to decide, is the lang= parameter supposed to specify a Wiktionary code, and if so, how do we deal with cases where that code does not uniquely define the Wikimedia code to link to? This is something we shouldn't just answer for {{wikipedia}} and the likes, but for {{t+}} as well. —CodeCa t 00:06, 13 April 2014 (UTC)[reply]

User:Kephir has gone ahead and created a template that does the reverse conversion. Unfortunately, it's creating more errors than we already had, and the number is still climbing. It also doesn't address the underlying problem that I described above. —CodeCa t 13:09, 15 April 2014 (UTC)[reply]

Tabbed languages problem

Can anyone figure out what's making tabbed languages break at tej? Lower Sorbian and Polish are being treated as subheaders of Hungarian, but I can't figure out why. —Aɴɢʀ (talk) 18:42, 11 April 2014 (UTC)[reply]

Columns templates. Fixed. — Keφr 19:11, 11 April 2014 (UTC)[reply]

Thanks. I wasn't the one who used a line-initial ";" for formatting, but I wasn't aware it isn't compatible with tabbed languages. —Aɴɢʀ (talk) 19:18, 11 April 2014 (UTC)[reply]

I don't think that was the problem. The problem was in {{top4}}, which Kephir fixed. The line-initial semicolon was just bad formatting. -Atelaes λάλει ἐμοί 19:22, 11 April 2014 (UTC)[reply]

Oh, okay. Here's another problem: {{R:lv:LEV}} adds a category to the pages where it's transcluded, but it puts the category in the top (usually English) section instead of in the section where it's transcluded (usually Latvian). Anything we can do about that? —Aɴɢʀ (talk) 22:14, 11 April 2014 (UTC)[reply]

Well, I'm pretty sure the problem is that it ends up being the first category, which I'm assuming is because it uses {{catlangname}} instead of simply creating a category manually. Yair's done a lot of work to tabbed languages since I worked on it, and they were the one who wrote the category sorting in my incarnation, so I'm not 100% sure on this, but I believe that tabbed languages does not ignore the order in which categories appear. This is useful because not all categories are easily placed in a language just by reading their name. Would it be a significant detriment to simply drop the use of {{catlangname}} and simply code the category manually? -Atelaes λάλει ἐμοί 22:46, 11 April 2014 (UTC)[reply]

{{catlangname}} shouldn't be causing the problem here. It expands to the category wikicode, as if you had typed it manually, but it does some processing so it's "smarter". I'd be very surprised if changing it back to a manual category fixes the problem. @Angr Could you give an example of an entry that has the problem? —CodeCa t 23:17, 11 April 2014 (UTC)[reply]

vasara is the example I looked at. In this case the category appears correctly in the Latvian tab, but so do all the Finnish categories. The problem is that it comes before them. Why would that be? -Atelaes λάλει ἐμοί 23:36, 11 April 2014 (UTC)[reply]

I suspect it's because of the <ref> tags. They're handled specially, and apparently the wiki software lists the categories listed in references before it lists the categories in the rest of the page. —CodeCa t 00:06, 12 April 2014 (UTC)[reply]

Ah. I did not notice that. That would make sense. -Atelaes λάλει ἐμοί 00:23, 12 April 2014 (UTC)[reply]

The page I found the problem on was actually ass, and using {{catlangname}} isn't the problem because it was adding a category directly until I changed it to use {{catlangname}}, which I did in hopes that that would solve the problem. (It didn't.) —Aɴɢʀ (talk) 07:34, 12 April 2014 (UTC)[reply]

Small bug in orange links gadget

I found a small bug in the gadget that turns links orange when the page exists, but doesn't have a language section of the same name. On Appendix:Proto-Germanic/furi, the Dutch link to veur#Dutch is blue, but the actual page only contains a "Dutch Low Saxon" section, none for "Dutch". Is it only looking at the first part of the name? —CodeCa t 01:38, 13 April 2014 (UTC)[reply]

I believe that's exactly what's happening. My jQuery is crap (something I need to remedy at some point), but it looks like the code is simply using .inArray(), which, according to the documentation page, functions identically to JS's native .indexOf(). So, it's simply testing whether the target hash matches the beginning of an anchor on the page. I imagine that this produce false positives rather rarely, but obviously not never. Does simply using Yair rand (talk • contribs) ping them? I'm not comfortable enough with the code to make the changes myself, both because of my jQuery impotence, and because there might be a reason for the sloppiness that I'm not thinking of. -Atelaes λάλει ἐμοί 02:20, 13 April 2014 (UTC)[reply]

Fixed, I think. --Yair rand (talk) 07:18, 24 April 2014 (UTC)[reply]

Loophole, sort of

It wouldn't let me edit User:Mglovesfun/to do/French/verb forms needing attention because I'm editing another user's page. But it would allow me to move it thus bypassing the issue all together. I wouldn't call this a bug because it's expected behavior. Can we call it a loophole? If so, does it actually need closing or can we just undo any bad moves? Renard Migrant (talk) 11:40, 14 April 2014 (UTC)[reply]

If I recall correctly, the original purpose of this edit filter was to prevent vandals from thrashing user pages and feeding bots with bad data. Nobody actually thought about moving pages to bypass that, including vandals, so it was not put in. If we want to maintain this filter, I guess we have to close this loophole now that the cat is out of the bag.

Though personally, I would drop the filter altogether, if only to counter the "vandals must be stopped at literally any cost" mentality I sometimes perceive in regulars, and to remove the impression that userspace is a sacred property of the user. Most of the time I hear about this filter, it is preventing someone from doing something useful. The abuse log also does not seem to contain evidence of any spectacular vandalism prevented. — Keφr 12:17, 14 April 2014 (UTC)[reply]

I believe it was prompted by a rash of spam left by bots on inactive users' pages. Most of the actual vandalism is against admins, who can protect their own pages. Chuck Entz (talk) 13:18, 14 April 2014 (UTC)[reply]

Broken Swedish declension template

Why is Template:sv-noun-reg-ar broken? All eight fields are blank. See for example biltvätt. Can one do "related changes" on a template and see all related changes in low-level templates that it calls? --LA2 (talk) 20:09, 17 April 2014 (UTC)[reply]

The forms are only shown if {{isValidPageName|{{{sg-nom-indef|}}}}} and so on are true. But those parameters (sg-nom-indef and such) are normally empty, so they are not valid page names and the test fails. —CodeCa t 20:49, 17 April 2014 (UTC)[reply]

It used to work fine. What has changed? LA2 (talk) 19:59, 18 April 2014 (UTC)[reply]

Perhaps your edit in isValidPageName on April 2 is guilty? So either you can change it back (it wasn't broken, it worked fine) or you can fix the Swedish templates that use this template? LA2 (talk) 20:04, 18 April 2014 (UTC)[reply]

It wasn't at all fine, and reverting those edits would certainly break a lot of things. Read the "Recent layout changes" thread above. —CodeCa t 21:48, 18 April 2014 (UTC)[reply]

I was not involved in designing these templates, and I have no idea why they use isValidPageName in the first place. My involvement is limited to adding Swedish words. I have no intention in getting involved in overall layout or template architecture politics. I'll leave it as broken as it is and possibly abandon Wiktionary alltogether if things are going to become broken in this way. It was fun while it lasted. --LA2 (talk) 21:54, 18 April 2014 (UTC)[reply]

Repeat after me: templates are not data and I should really just relax. DTLHS (talk) 22:26, 18 April 2014 (UTC)[reply]

Guys I've fixed it --kc_kennylau (talk) 02:26, 19 April 2014 (UTC)[reply]

You fixed nothing, you just cheated around the problem by making {{isValidPageName}} treat empty strings as valid page names, which they're obviously not. I've reverted. —CodeCa t 02:29, 19 April 2014 (UTC)[reply]

The template works that way, treating empty strings as valid pagename. --kc_kennylau (talk) 03:17, 19 April 2014 (UTC)[reply]

Why don't you actually fix Template:sv-noun-reg-ar? That's the template that's really broken, Template:isValidPageName is fine. —CodeCa t 12:32, 19 April 2014 (UTC)[reply]

I've fixed the template, apparently because everyone else is too busy arguing with me. —CodeCa t 12:40, 19 April 2014 (UTC)[reply]

@CodeCat ~~You have only fixed one template. All these templates use the "bug" that isValidPageName with an empty string returns valid.~~ Moreover, {{isValidPageName|aksdjfkasjglkjas}} returns . --kc_kennylau (talk) 13:39, 19 April 2014 (UTC) deleted my line after checking --kc_kennylau (talk) 13:43, 19 April 2014 (UTC)[reply]

{{isValidPageName}} is meant to be used when a particular page name is allowed, not when it exists. For the latter we already have {{#ifexist:. The template is marked as deprecated because in most cases where it has been used in the past, it was to allow people to include wikilinks in a parameter without breaking things. {{l}}, {{head}} and other Lua-enabled templates can now handle such cases flexibly, so this template isn't really needed anymore then. —CodeCa t 13:44, 19 April 2014 (UTC)[reply]

Maybe I was deceived by my pride. --kc_kennylau (talk) 13:49, 19 April 2014 (UTC)[reply]

Various sections at 人

Something's gone wrong with language and other sections starting at Mandarin section and below.--Anatoli ^{(обсудить}/^вклад) 09:48, 18 April 2014 (UTC)[reply]

An incompatible mixture of template types had been used in the derived terms section. Fixed. SemperBlotto (talk) 10:03, 18 April 2014 (UTC)[reply]

Thank you! --Anatoli ^{(обсудить}/^вклад) 10:20, 18 April 2014 (UTC)[reply]

Module errors in Proto-Germanic entries

I'm not sure why but I think this [[1]] is causing module errors in some Proto-Germanic entries. See Appendix:Proto-Germanic/abraz and Appendix:Proto-Germanic/agluz for examples. Anglom (talk) 15:19, 19 April 2014 (UTC)[reply]

@Anglom Look at the whole function and you'll know why: the variable "frame" is not even defined in the function! Change line 51 to "return export.tag_text(frame, text, lang, sc, face)" and line 55 to "function export.tag_text(frame, text, lang, sc, face)" to solve the problem. --kc_kennylau (talk) 15:56, 19 April 2014 (UTC)[reply]

I can't edit it. I would appreciate it if someone else would, though. Anglom (talk) 16:09, 19 April 2014 (UTC)[reply]

@-sche, Angr, CodeCat, SemperBlotto, Kephir, Wikitiki89 sure. --kc_kennylau (talk) 16:14, 19 April 2014 (UTC)[reply]

The problem is something that CodeCat overlooked in trying to create tracking templates. --Wiki Tiki 89 16:22, 19 April 2014 (UTC)[reply]

~~What do you mean by overlooked? What's the problem created?~~ Moreover, would the edit that I suggested solve the problem? (I mean using the preview function in the module) --kc_kennylau (talk) 16:25, 19 April 2014 (UTC)[reply]

Never mind, I found the answer above. --kc_kennylau (talk) 16:27, 19 April 2014 (UTC)[reply]

Diacritic differentiation in search box

Recently, perhaps as part of the Cirrus update, Greek diacritic letters are now all treated as seperate letters in the search box as well as the page search proper. E.g. typing "ά" into the search box will now show as completions words that begin with "ά" only, and not "α ἀ ἁ ᾶ ᾳ ᾅ" &c. Likewise, searching for "ὅρος" yields only "ὅρος", and not "ὄρος" or "ὀρός". I find the old behaviour preferable, for two reasons.

First, typing Ancient Greek diacritics is pretty much universally more difficult than typing the plain letters. I do a fair amount of contributing on an iPad (not by choice, it's the only tool I have access to for much of the day) and typing anything but a tonos when not editing requires copying it from elsewhere. Before, I could just type in the word sans diacritics in the search box, then click the right suggestion. Now the fastest way to navigate is to make a Google search.

Second, it's quite common to forget the exact diacritics or accent placement of an Ancient Greek word, and this change makes it significantly more difficult to navigate.

I can understand the advantage of such a change, but I really don't think we have enough entries in either Ancient or Modern Greek for such an advantage to be worth the problems I have specified.

Conclusion (tldr): The search tool now treats "α ἀ ἁ ᾶ ᾳ ᾅ" etc. as seperate letters (as opposed to its treatment of "e ë é" etc.), which makes navigation much more difficult. Is it possible to return to the original behavior? ObsequiousNewt (ἔβαζα|ἐτλέλεσα) 15:31, 19 April 2014 (UTC)[reply]

As a sidenote, I personally don't like the new search engine AT ALL, because when I use my mobile to view this website, I can't go to a page directly using the search box, I have to go through a searching page first. --kc_kennylau (talk) 15:50, 19 April 2014 (UTC)[reply]

@Kc kennylau That's strange. I can still go to search results directly on my mobile. Are you not getting the drop down overlay of search results when you type the search term? What type of phone are you using? Kaldari (talk) 21:50, 10 June 2014 (UTC)[reply]

@Kaldari I can get to the page directly if I click the drop down menu, but pressing enter takes me to the search page. Moreover, sometimes my phone goes to the previous page when I click the search box, which is a behavious I find strange. --kc_kennylau (talk) 05:04, 11 June 2014 (UTC)[reply]

@Kc kennylau I've opened a bug on Bugzilla for further discussion. Feel free to chime in there. Cheers. Kaldari (talk) 00:01, 12 June 2014 (UTC)[reply]

term (and generic question) about named/positional param ordering

I'm looking at the term template specifically, but this could apply to any template that has precedence but no standard written. I'll look at simple (maybe old?) term templates and see term|lang=<lang>|<gloss>, but this is not normal, in fact out of hundreds of thousands of term entries that I've parsed only 6000+ have this ordering.

It seems that the correct ordering is term|<term>|<gloss>|lang, and if <gloss> is not there, that's ok, but it should be represented like term|<term>||lang. Is this correct?

I'm asking because I've been parsing data out of a recent XML dump for a personal project, and the non-standard usages of term are driving me crazy. Rather than code a whole section of workarounds, I'd rather fix this at the source. I know that named parameters appear after the positional parameter they are supposed to modify, but thats not the case above, lang comes directly after term.

It doesn't render incorrectly which makes me think there is no standard for where the lang position should go. Is this correct? Or should I go through and change all these term entries as I come across them? I know its not a priority or will affect anything but its breaking my parser and causing me lots of headaches.. Thanks! — This comment was unsigned.

To give you a quick answer before someone who actually is good at templating gives the full story, here is my understanding:

{{term}} has three positional parameters: 1: the term itself, 2: a piped alternative which appears instead of the term itself, 3: a gloss. I suppose there could be more but I haven't seen them. lang (language), sc (script), tr (transcription) are the three most common named parameters, but there might be more of them too. Named parameters can appear anywhere. Position 1 for the positional parameters refers to the first parameter slot that does not have a "name=" in it. The full story can be found at Mediawiki Help:Templates.

For {{term}} the second slot is very commonly not used, but the empty position is mandatory to make sure the correct interpretation is given to the gloss (3rd) parameter. DCDuring TALK 22:32, 20 April 2014 (UTC)[reply]

{{term}} also has two other parameters: "pos" for grammatical part of speech (aka, word class) and "lit", for literal gloss of a term that presumably also has a non-transparent gloss. Also, whether we need it or not most of the operative code is in Lua modules, as is the case with {{term}}. Just a little something we do, together with CSS and JS to help make the whole thing even less transparent. DCDuring TALK 22:44, 20 April 2014 (UTC)[reply]

Thats what I thought, the named parameters don't give me grief its just the positional parameters when they conflict with named params :) I've been using this page as a reference - https://en.wiktionary.org/wiki/Template:term. Would it be appropriate for me to change this {{term|lang=mul||ᚫ|tr=a|ansuz}} to this {{term||ᚫ|tr=a|ansuz|lang=mul}} (where lang is the final param)? I'd rather fix it here than code around it, its a rare case to see it like this..Panikal (talk) 22:58, 20 April 2014 (UTC)[reply]

Named parameters (with a name and an equals sign) are not ordered, they can be placed anywhere among the others. Even this is valid: {{term|tr=a|3=ansuz|lang=mul|2=ᚫ}}. So there is no reason to change what you propose, it's simply part of the wiki to allow different ordering of parameters. —CodeCa t 23:41, 20 April 2014 (UTC)[reply]

Ah ha! That was the hint I needed..Thanks! I've looked around for that sort of universal rule but hadn't found it - the pattern I was seeing was all the pages reflect all the template help pages, so I assumed it was just a general rule with some workarounds to render bad templates. ;) I'll change my approach for this then. Thanks again! Panikal (talk) 23:53, 20 April 2014 (UTC) Edit - Only six thousand individual lines of the latest articles dump have 'bad templates' by what I thought was the definition, so its almost a de-facto standard.... :)[reply]

Often templates have certain orderings of parameters that are more common, but it's up to the editor to decide what they want to use and nothing is really standardised, nor does it need to be. Usually lang= is put last, but I often find myself writing {{IPA|lang=...|/word/}} by force of habit. If you're looking to parse wiki code, you may want to give mwparserfromhell (for Python) a try. It's very good. —CodeCa t 00:08, 21 April 2014 (UTC)[reply]

*Not* having to worry about 'exceptions' of the named params being in place of the positional params allowed me to change my approach and reduce the complexity of that part of the code by about 50%. I hope it doesn't change. ;) Thanks again for your help! Panikal (talk) 01:13, 21 April 2014 (UTC)[reply]

Middle Vietnamese

I'd like to add some entries for common words in Middle Vietnamese. They'll all have citations ranging from the 17th century to the early 19th century. Before I begin, I have a couple questions:

Should I put these words under a new "Middle Vietnamese" section and category? There doesn't seem to be an ISO 639 code for Middle Vietnamese, yet it doesn't feel right to put these words under a Vietnamese section, even with an "archaic" label, because there are substantial differences in orthography, grammar, and vocabulary.
What should I do about words that can't be represented in Unicode? For example, I can fake the u+apex+tilde in "cu᷄̃" (cũng) with U+1DC4. And " ĕào" (vào) uses a letter slated for the next release of Unicode.

– Minh Nguyễn (talk, contribs) 07:13, 21 April 2014 (UTC)[reply]

We can certainly create our own code for Middle Vietnamese, as we have for many other languages that don't have an ISO 639 code. I'd suggest mkh-mvi. Until Unicode can accommodate Middle Vietnamese, I suppose approximations like U+1DC4 are the way to go, but we can't use images as substitutes, since they obviously can't be accommodated in page names. Until the "b with flourish" is part of Unicode, I'd suggest using some existing Unicode character like ƀ as a substitute. —Aɴɢʀ (talk) 10:51, 21 April 2014 (UTC)[reply]

U+A797 ꞗ Latin small letter B with flourish is available in Unicode 7.0 Beta, which is expected to be released in July. With the release so close, maybe we might as well use the actual character. (I included it in a font a couple years ago.) – Minh Nguyễn (talk, contribs) 09:33, 22 April 2014 (UTC)[reply]

It'd be great to have these. Presumably, they will be in Quốc Ngữ script, as attested since Rhodes' dictionary? How would you decide on the boundary between Middle Vietnamese and Modern Vietnamese? Wyang (talk) 09:39, 22 April 2014 (UTC)[reply]

Yes. Of course the primary script in that time was still chữ Nôm, but it shouldn't be a problem logistically until we start finding examples of archaic Nôm usage.

I probably won't quote de Rhodes's dictionary directly, per Wiktionary guidelines. However, his 1651 Catechism has a wealth of material. I've been citing it in places like Citations:bánh. The other major source I have is Philipphê Bỉnh's handwritten Sách sổ sang chép mọi việc (1822). You can find scans of the first couple pages here. Bỉnh interestingly sticks to de Rhodes's orthography (minus the B with flourish) even as his contemporaries have moved to something largely identical to today's Vietnamese alphabet.

Come to think of it, it may be a stretch to represent Bỉnh's early 19th century Vietnamese as "Middle Vietnamese", even if he's using a Middle Vietnamese orthography. Do you think it'd be better to treat Middle Vietnamese as a chronolect (as something like "vi-mid" in Module:etymology language/data)?

– Minh Nguyễn (talk, contribs) 06:41, 24 April 2014 (UTC)[reply]

@Mxn I've added a code for Middle Vietnamese — mkh-mvi, per Angr's suggestion. It can be used in entries; I've just switched Ꞗ, ꞗ, trời and trên to use it. :) - -sche (discuss) 21:17, 14 August 2014 (UTC)[reply]

Template:ja-usex issue

Can someone please help fix this problem with the usex. It falls over exactly on the string "お聞き"/"おきき"

あのう，ちょっとお聞(き)きしたいんですが。駅(えき)はどこでしょう。

anō, chotto okiki shitai n desu ga. Eki wa doko deshō.

Excuse me, but could you tell me where the station is?

. It's commented out at あのう. --Anatoli ^{(обсудить}/^вклад) 12:48, 23 April 2014 (UTC)[reply]

Fixed. — Keφr 15:54, 23 April 2014 (UTC)[reply]

Dziękuję bardzo! --Anatoli ^{(обсудить}/^вклад) 22:48, 23 April 2014 (UTC)[reply]

Protecting word of the day

What happened to the word of the day being protected from non-admin edits? --Wiki Tiki 89 21:36, 24 April 2014 (UTC)[reply]

I unprotected the pages mainly because it makes bot maintenance more difficult. They're still semi-protected though. I don't know if full protection is really necessary... it's kind of overused on Wiktionary. —CodeCa t 22:06, 24 April 2014 (UTC)[reply]

Then how do you explain the anon edits at צ׳יק צ׳ק when it was word of the day (on April 24, 2014)? --Wiki Tiki 89 05:44, 25 April 2014 (UTC)[reply]

That is not vandalism. The letter tsade can be transcribed as ch or ṣ. The letter qoph can be transcribed as k or q. --kc_kennylau (talk) 05:46, 25 April 2014 (UTC)[reply]

Did I mention vandalism? It was an honest attempt to correct the transliteration, only it was wrong for two reasons: (1) we generally don't use that transliteration system, especially for words coined in modern times and (2) the gereshes (the apostrophe-like symbols) indicate that it is pronounced /tʃ/ and so it has no relation to either the modern /ts/ or historical /sˤ/. Anyway, words of the day should not be edited while they featured to avoid both accidental and intentional mistakes. --Wiki Tiki 89 05:55, 25 April 2014 (UTC)[reply]

I'm sorry, I thought you meant when they were no longer featured. While featured, they're automatically protected because the Main Page has cascade protection on it. —CodeCa t 12:16, 25 April 2014 (UTC)[reply]

No, they're not. צ׳יק צ׳ק was edited by an anon while it was on the Main Page, and if I log out now, I could edit dictablanda (I just tried but didn't save any changes). —Aɴɢʀ (talk) 12:35, 25 April 2014 (UTC)[reply]

Oh, I was mistaken again. I don't think we ever protected such pages then. Cascade protection only applies to transclusion, anyway. We could make it automatic if we transcluded the page onto the main page... —CodeCa t 12:51, 25 April 2014 (UTC)[reply]

Before I was an admin, I remember being annoyed that I couldn't edit pages while they were word of the day. --Wiki Tiki 89 12:53, 25 April 2014 (UTC)[reply]

...which is why you want other users annoyed now? (And I think we did not have Foreign Word of the Day back then.) — Keφr 15:28, 25 April 2014 (UTC)[reply]

I get annoyed at traffic lights also, that doesn't mean we should get rid of them. --Wiki Tiki 89 23:10, 25 April 2014 (UTC)[reply]

Modified Special:AbuseFilter/24

I have modified AF 24 as it didn't allow stewards to tag a user page as spam. You may wish to consider whether that is truly an appropriate filter as much of current spambot abuse is creation of spam user pages. I have some reasonable filters available to grab/prevent some of that crap if you need it. Meta filters 69 and 72. — billinghurst sDrewth 12:03, 25 April 2014 (UTC)[reply]

At the very least, allow any user to tag a page with a {{delete}}. --Glaisher (talk) 04:56, 27 April 2014 (UTC)[reply]

You can always tag the talk page with a note like "delete user page" Chuck Entz (talk) 05:14, 27 April 2014 (UTC)[reply]

Sure, will keep this in mind. But wouldn't it be more nice if there was an exception for deletion tagging in that filter? --Glaisher (talk) 05:21, 27 April 2014 (UTC)[reply]

Added !(new_wikitext rlike ".*\{\{(q?d|[dD]b|[dD]elete|speedy)[}|].*").—msh210℠ (talk) 07:19, 28 April 2014 (UTC)[reply]

Couldn't a smart vandal now type "<nowiki>{{d}}</nowiki> I IZ HATE U"? --Wiki Tiki 89 07:29, 28 April 2014 (UTC)[reply]

Sure. Abuse filters are to filter out edits so human abuse-spotters have an easier job. They'll never filter all abuse.—msh210℠ (talk) 07:36, 28 April 2014 (UTC)[reply]

Links to the help page

I've noticed that the "Help" link on the side bar brings users to a page on MediaWiki.org. Could someone please add "Help:Contents" to MediaWiki:Helppage to make the link on the sidebar more relevant? Whym (talk) 12:29, 25 April 2014 (UTC)[reply]

On a related note, could someone please replace two instances of "Wiktionary:Help" with "Help:Contents" in the main page? The former redirects to the latter. It is a little bit awkward for a link on the main page that is supposedly helpful. Whym (talk) 12:29, 25 April 2014 (UTC)[reply]

Can any administrator please do this? It will only take a few minutes, I suppose. Whym (talk) 08:32, 3 May 2014 (UTC)[reply]

Done. —Ruakh_TALK 09:37, 3 May 2014 (UTC)[reply]

MediaWiki cannot process comments when a template is calling a module if the module is subst'ed

If I put this {{subst:#invoke:User:kc_kennylau/sandbox|bug}} in a template and subst it like this {{subst:User:kc_kennylau/bug}}, it will generate a script error like this Module error" does not exist.-->, saying that the function "bug" does not exist. However, the comment is supposed to be ignored. This bug can only be reproduced if both the module call in the template and the template call here are substituted. (note: the [NEWLINE] is a new-line character U+000A) --kc_kennylau (talk) 15:20, 25 April 2014 (UTC)[reply]

I don't think there's much we can do about that here. You should probably report the bug to MediaWiki directly. —CodeCa t 15:45, 25 April 2014 (UTC)[reply]

It turns out that this bug has already been reported, and this bug affects not only #invoke, but other parser functions as well. --kc_kennylau (talk) 15:58, 25 April 2014 (UTC)[reply]

JavaScript edit request for WT:EDIT

Previous discussions: Wiktionary:Beer parlour/2014/January#Proposal to change how translation checks and requests are formatted, Wiktionary:Grease pit/2014/February#Can someone update WT:EDIT for the "new" translation check format?

A request was made for this change before, but nothing happened then, so I'm re-submitting it.

For the new translation format with {{t-check}} and {{t-needed}}, some changes need to be made to WT:EDIT. It has to understand both the old and the new format before we can make any other changes, otherwise things will start breaking when it no longer understands our translation tables. I'm not familiar enough with WT:EDIT to trust myself to make the changes properly. Can someone do it? —CodeCa t 13:30, 26 April 2014 (UTC)[reply]

Language name and code templates

Do we have templates that will convert a language code into the corresponding name and vice versa? In other words, templates where entering {{template1|de}} will return German and entering {{template2|German}} will return de? If not, should we make some? Ideally they should work for protolanguages and language families as well, so {{template1|gem}} will return Germanic and {{template1|gem-pro}} will return Proto-Germanic. —Aɴɢʀ (talk) 16:24, 27 April 2014 (UTC)[reply]

There's {{#invoke:languages/templates|lookup|(lang)|getCanonicalName}}. —CodeCa t 16:34, 27 April 2014 (UTC)[reply]

Yeah, but that's way too complicated to remember, and it doesn't do families, and it only goes from code to name, not the other way round. Anyway, I take your answer to mean no, we don't have templates that do this, so the question is, do we want such templates? (I do, but maybe I'm the only one.) —Aɴɢʀ (talk) 17:59, 27 April 2014 (UTC)[reply]

I don't see why we would need them. The code above may be too long but it's only to be used in templates anyway. And no template ever needs more than just language code > name support. We have almost none that need the reverse ({{ttbc}} and {{trreq}} are the only ones that use {{langrev}} that I know of), nor are there very many that need to do anything with families (just {{derivcatboiler}}, as well as {{etyl}} which is Lua-based already). —CodeCa t 18:06, 27 April 2014 (UTC)[reply]

Well, we human editors may need them, even if only for subst'ing. When I'm tidying up translation tables, I may have the code but not the canonical language name. The xte gadget can help me there, but that means scrolling all the way back up to the top of the page. I'd rather just type {{subst:langname|hix}} or whatever and have it added automatically. And if I'm creating a new category like Category:Kinyarwanda terms from Hixkaryana, even if I know those language codes, I will still then need to create Category:Kinyarwanda terms from Cariban languages, and the xte gadget won't tell me the code for Cariban languages. So if I could type {{derivcatboiler|rw|{{subst:langcode|Cariban}}}} it would make my life a lot easier. —Aɴɢʀ (talk) 18:47, 27 April 2014 (UTC)[reply]

I created {{\}} for this use case. I should probably remember it more often… — Keφr 18:54, 27 April 2014 (UTC)[reply]

A backslash is a horrible name for a template. --Wiki Tiki 89 19:17, 27 April 2014 (UTC)[reply]

[E/C] That can certainly be done, and in fact {{langrev}} can already do this, except that it doesn't use Lua yet because of some obstacles, so the list of languages it works from is a bit outdated here and there. I don't disagree with substable templates to find names and codes, in principle, as long as we make sure it never gets used by templates or gadgets. I think a pair of substable templates, {{c2n}} and {{n2c}}, could be useful for editors. —CodeCa t 18:55, 27 April 2014 (UTC)[reply]

Ok, let's say I type in "Gaelic"- what should I expect back? Chuck Entz (talk) 21:16, 27 April 2014 (UTC)[reply]

I guess a module error since there is no code whose corresponding canonical name is Gaelic. Unless someone is clever enough to get the template to return a message like "The name you have input is ambiguous. Please select one of 'Goidelic languages', 'Irish', 'Manx' or 'Scottish Gaelic'." But there are lots of ambiguous names (Sami, Maya(n), etc.) for which such messages would have to be written, so leaving it as a module error would probably be easiest. —Aɴɢʀ (talk) 09:43, 28 April 2014 (UTC)[reply]

I actually created a module for myself some time ago for creating categories. --kc_kennylau (talk) 10:02, 28 April 2014 (UTC)[reply]

2¢: like Angr, I would find a subst-able name-to-code converter useful when creating categories. - -sche (discuss) 16:57, 30 April 2014 (UTC)[reply]

And I use one (template:langrev, which, as mentioned above, is imperfect).—msh210℠ (talk) 20:44, 30 April 2014 (UTC)[reply]

Can we automate the import of genders for French plurals?

[[:Category:French nouns with incomplete genders has several thousand entries thanks to MewBot (talk • contribs) deliberately breaking them. Gender does matter for noun plurals. For adjective plurals, at least in the definition line we put masculine plural of or feminine plural of so the gender template is kind of redundant (but still useful IMO, not all duplication is bad). Jesus, this really needs to be the English Wiktionary not the CodeCat Wiktionary. Renard Migrant (talk) 13:06, 29 April 2014 (UTC)[reply]

Maybe if you actually checked it, you'd see that it was MGlovesFun's bot that did it, not mine. —CodeCa t 16:22, 29 April 2014 (UTC)[reply]

No check accueils, MewBot removed the gender and didn't put it in the edit summary. So uncontroversial that nobody could object? Quite the opposite, French nouns have gender, ugh. I dunno what to say, no wonder I don't participate in discussions anymore. Renard Migrant (talk) 11:36, 10 May 2014 (UTC)[reply]

Oddity in Special:New pages

If you go to Special:New pages and select User as the Namespace, one of the entries listed is Appendix:Irish second-declension nouns. I'm pretty sure that it is not a User. (It doesn't show up if you select Appendix as the Namespace) Any ideas? SemperBlotto (talk) 07:17, 30 April 2014 (UTC)[reply]

I think this is how (the current) MediaWiki treats it. It is shown like this because it was in the user space as of its first revision and then moved to the appendix space. As indicated [2], the original page name was User:Angr/Irish second-declension nouns. Whym (talk) 08:18, 30 April 2014 (UTC)[reply]

Should I not have done it that way? I wanted it in my user space until it was ready to go, and then moved it without leaving a redirect rather than copy and pasting it into a new page. —Aɴɢʀ (talk) 08:36, 30 April 2014 (UTC)[reply]

~~It is no longer there.~~ Will it be assigned to its correct location when Wiktionary's Special pages get batch-updated (reindexed ?) How often? DCDuring TALK 12:40, 30 April 2014 (UTC)[reply]

If it isn't corrected within 30 days or so of the namespace change it seems like a bug. DCDuring TALK 12:47, 30 April 2014 (UTC)[reply]

How long do pages stay at Special:NewPages at all? After a while, pages aren't new anymore. —Aɴɢʀ (talk) 13:23, 30 April 2014 (UTC)[reply]

The oldest in the queue is 30 days ago. Whym (talk) 14:09, 30 April 2014 (UTC)[reply]

Then if it isn't corrected within 30 days of the namespace change, no one will ever know. —Aɴɢʀ (talk) 17:59, 30 April 2014 (UTC)[reply]

It seems to have been reported: bugzilla:55866 and bugzilla:36930.Whym (talk) 14:09, 30 April 2014 (UTC)[reply]

30 days would be the maximum, 15 days would be the average, so about half of the items that would have had the bug would show it at any viewing of the page. DCDuring TALK 18:06, 30 April 2014 (UTC)[reply]