Wiktionary:Grease pit/2013/March

WT:CSS

I started a new page WT:CSS, because it was pointed out that our main style sheet is not documented. —Michael Z. 2013-03-01 19:34 z

translation tables slow

I think starting with the introduction of Web fonts (but I'm not sure), it now takes a long time for translation tables to drop-down-able. (Newest Firefox for Mac; newest Firefox for Windows.) Is there anything to be done about this?—msh210℠ (talk) 06:42, 4 March 2013 (UTC)[reply]

I suspect that it's not the translation tables, per se, but just general slowness. I could be wrong, but I believe that pretty much nothing clickable becomes active until the page is finished being drawn. I've noticed slowness everywhere- not just on pages with web fonts. Chuck Entz (talk) 06:59, 4 March 2013 (UTC)[reply]

Over the years this Wiki has become slower and slower. Every time somebody adds a bit more cleverness, or adds complications to a template, or replaces simple text with a template allowing users to change its appearance, it gets a little bit slower. I think we should stamp down on added cleverness, and maybe roll back some we already have. KISS SemperBlotto (talk) 08:11, 4 March 2013 (UTC)[reply]

I wonder too how much past cleverness might have simpler and more elegant solutions, now that folks have a clearer idea what we want for the site. I know that some processes I've encountered at various job sites have fossilized from years and years of past half-formed ideas about what was required, and sitting down and looking at specific inputs and required outputs can often lead to a much more streamlined way of doing things. -- Eiríkr Útlendi │ Tala við mig 16:19, 4 March 2013 (UTC)[reply]

Isn't {{t}} and its relatives a reasonable suspect for performance problems? Can't someone figure out a way for Lua to improve performance. The number of large and very large translations tables seems to be growing faster than average page size. DCDuring TALK 18:23, 4 March 2013 (UTC)[reply]

If Scribunto really can make templates like {{t}}, {{context}} and other big templates load faster, it could be a very useful tool for improving the usability of our larger pages. Mglovesfun (talk) 18:55, 4 March 2013 (UTC)[reply]

Have you tried turning off WebFonts in the preferences? -- Liliana • 18:27, 4 March 2013 (UTC)[reply]

Good point. Okay, two more bits of information: (1) I think it only happens the first time I load an entry (any entry, not the specific one I'm looking at) in a cacheless browser. (2) I just tried it with WebFonts off and it didn't happen. (Or if it did then it was small enough a delay that I didn't notice it, which was not my experience previous times.)—msh210℠ (talk) 19:50, 4 March 2013 (UTC)[reply]

Just what I suspected: downloading all these fonts takes a long time and the browser is effectively frozen while it happens. This makes WebFonts a real nuisance. -- Liliana • 20:26, 4 March 2013 (UTC)[reply]

how do we make we bfonts opt-in, on a per-language basis? My browser downloads 1.6 MB of web fonts on this site, but needs zero of them to display all of the languages on the Main Page. This is irresponsible for any website, much less one that should be accessible to people with poor network bandwidth. —Michael Z. 2013-03-04 20:42 z

Wow, I thought the whole webfonts feature only kicked in if the browser was missing a font required to display a given page. I had no idea it was causing downloads even when not needed. That's not good. -- Eiríkr Útlendi │ Tala við mig 20:49, 4 March 2013 (UTC)[reply]

It only loads a font when there's some text on the page that is set to use a specific font and the user doesn't already have that particular font on their computer, even if they already have some other font that could display the text, I think. --Yair rand (talk) 21:08, 4 March 2013 (UTC)[reply]

I've been having this issue since December. No idea what's causing it. --Yair rand (talk) 21:04, 4 March 2013 (UTC)[reply]

So who chose the web font files for Wiktionary? Are they the same fonts that we are specifying in MediaWiki:Common.css? —Michael Z. 2013-03-05 00:15 z

The WebFonts default settings, which we're currently using and can get changed through bugzilla, apply certain fonts to certain languages. Since this doesn't cover all of our uses, there are also some set from Common.css. --Yair rand (talk) 01:28, 5 March 2013 (UTC)[reply]

What are the default settings (the docs[1] only list all supported languages)? Which fonts set from Common.css? Where is our documentation for this? Which specific browser or OS inadequacies are we serving fonts for? —Michael Z. 2013-03-05 01:58 z

"Supported languages" seems to mean languages which fonts are served for by default. The only fonts set from Common.css are for {{Bugi}}, {{Ethi}}, and {{Mymr}}, I think. --Yair rand (talk) 02:05, 5 March 2013 (UTC)[reply]

How do I find out which ones? How were the requirements determined? I hope we don’t add another 300 kB to the page load just because one editor requests their favourite font. I can’t afford to have my mobile plan notch up a tier just because I visit Wiktionary five times during a month. —Michael Z. 2013-03-05 02:24 z

Modules can now be documented

There has been an update to Scribunto, which automatically transcludes the documentation subpage onto the top of the page. They can be used to provide nicely-formatted documentation of the module, and also allow you to categorise it (put the category in <includeonly> tags). Documentation subpages are treated as "special" by the software. Unlike subpages with other names, they are not interpreted as modules. The documentation subpage's name can be changed by editing MediaWiki:scribunto-doc-subpage-name. Its default value is "doc", but I've changed it to "documentation" per WT:RFM#Documentation subpages to /documentation. —CodeCa t 22:19, 6 March 2013 (UTC)[reply]

A recent update to Scribunto [2] has changed the way documentation pages are handled, it's now at MediaWiki:Scribunto-doc-page-name instead. I've updated it accordingly. —CodeCa t 16:33, 15 March 2013 (UTC)[reply]

updates to User:Conrad.Irwin/creation.js

This script generates sense-lines of the form {{form of|[[lemma]]|lang=foo}}. Unless I'm mistaken, it no longer needs to explicitly wikilink the terms, because the templates create links automatically and our page counter no longer relies on the presence of square brackets. Also: we could discuss whether to update it to use {{head|foo|partofspeech?}} rather than '''pagename'''. - -sche (discuss) 18:01, 9 March 2013 (UTC)[reply]

I would definitely agree with using {{head}}, although I'm not sure if a PoS is needed, since many form-of templates themselves already add categories. I don't know if that is desirable, but that's a separate question. Also, I think it would be a good idea to replace all existing cases where such raw links are still in use. Could someone make a list of all templates that still allow such usage? I can then add a cleanup category to them, and run a bot script to update all the usages so that we can finally abandon this "legacy". —CodeCa t 18:08, 9 March 2013 (UTC)[reply]

What is the advantage, apart from uniformity, to having {{head}} instead of using PAGENAME for, let's say, English? Why would we want to have such a vast number of transclusions of a single template? DCDuring TALK 18:47, 9 March 2013 (UTC)[reply]

For English, the advantage is that it is consistent with our intended coding of headwords elsewhere. There is somewhat of a consensus to move towards more CSS-based formatting combined with making better use of semantic HTML and classes rather than hard-coded formatting. One of those things is to write headwords as word, which we've already started doing for several templates and modules and which I would definitely consider a good thing. However, if we use plain bold text for English, then that would make English inconsistent with all other languages. —CodeCa t 19:04, 9 March 2013 (UTC)[reply]

Is it worth calling {{head}} on more pages for this reason? Doesn't the extra template call to a relatively large template slow down the loading of pages? Mglovesfun (talk) 20:04, 9 March 2013 (UTC)[reply]

It's not really a very large template, and when it's converted to Lua it will be quite a bit faster because Lua can easily support any number of optional parameters the way {{head}} uses them, without any significant slowdown. And anyway, {{head}} isn't really called that often per page... {{l}} is called, on average, more often within any single language section than {{head}} is called on any given page (to put that differently: most entries have more links than pages have entries). —CodeCa t 20:25, 9 March 2013 (UTC)[reply]

Yes, please get right of square brackets. User:Mglovesfun/vector.js has a line (more than one in fact) to get rid of square brackets from templates that do literally nothing. Mglovesfun (talk) 20:30, 9 March 2013 (UTC)[reply]

Ok, then I would like to have a list of all the templates that currently contain code to allow raw-linking in their parameter. I have already noted {{form of}}, which is used by many other templates as well; it now adds entries to Category:Entries using form-of templates with a raw link. You can recognise the templates because they use {{isValidPageName}}. Come to think of it... are there any other uses for that template at all? —CodeCa t 14:05, 11 March 2013 (UTC)[reply]

Improving how module documentation currently displays

Currently, when a module needs documentation, it shows a link, like on Module:User:CodeCat. But most of the time, we only want/need the documentation page to put the module in a category, so once we create it, it ends up transcluding an empty page and looks like this: Module:eo-conj. I wonder if that could be improved, because it seems like a problem in a few ways. Firstly, there is no indication that anything at all has been transcluded, unlike what {{documentation}} displays. Secondly, there is no link to the documentation page itself; this would be fixed by fixing the previous problem, but a tab like we have on Template: pages would also be a good idea. And finally, it seems rather pointless for Scribunto to think that it has transcluded documentation. But all it has really transcluded is a category, so it ends up showing a horizontal rule with nothing above it, which leaves you to guess about what it means. —CodeCa t 18:04, 9 March 2013 (UTC)[reply]

List request.

Hi. I'm trying to insure that all English plurals belonging in the categories, Category:English plurals ending in "-ies", Category:English plurals ending in "-es", and Category:English irregular plurals ending in "-ves", are properly categorized. However, as we currently have 115,950 English plurals, weeding through that list is proving to be excessive. Can someone with the technical knowhow generate individual lists of all English plurals ending in "ies", "ses", "xes", "ches", "shes", and "ves", preferably limited to terms which are not already in the aforementioned categories? I will then plow through the lists and fix the ones which need to be categorized. (I suppose this could be automated entirely if someone could make a bot that understood that plurals like "waves" are normal formations while plurals like "pelves" are an "-es" formation and plurals like "wives" and "wolves" are a "-ves" formation). Cheers! bd2412 T 03:22, 11 March 2013 (UTC)[reply]

I noticed that you've been adding that category to many pages. However, when {{en-noun}} is converted to Lua, that will all become redundant, because Lua can easily perform the categorization itself, automatically. —CodeCa t 13:44, 11 March 2013 (UTC)[reply]

I'm afraid I don't know what Lua is, or how it would perform such categorization. Although some of these pluralizations are predictable, it would need to know for example that "leaf" becomes "leaves" while "waif" becomes "waifs". bd2412 T 01:54, 12 March 2013 (UTC)[reply]

Re: what Lua is: See Wiktionary:Scribunto. Re: knowing that "leaf" becomes "leaves" while "waif" becomes "waifs": Well, technically, that information is already embedded in the templates; [[leaves]] contains {{plural of|leaf}}, for example. But I'm not sure how useful that fact is, since {{plural of}} is not English-specific, so we wouldn't really want to "contaminate" it with this sort of categorization information. (Though to be honest, I'm not sure these categories should exist, anyway; wouldn't it be better for [[leaf]] to be in Category:English nouns with irregular plurals in "-ves"? The latter, in addition to being preferable in general IMHO, is also doable by Luicizing {{en-noun}}.) —Ruakh_TALK 02:25, 12 March 2013 (UTC)[reply]

I don't see a conflict between having leaf in a category for nouns having a certain kind of irregular plural, and having leaves in a category for nouns being that kind of irregular plural. I think the categorization would be particularly useful, given that leafs exists (as a form of the verb, to leaf), and that similar instances occur of words existing that readers might mistakenly assume to be the regular plural form of words with irregular plurals. If someone would be so kind as to generate the aforementioned lists, I will gladly effect this categorization in a matter of hours. bd2412 T 02:52, 12 March 2013 (UTC)[reply]

Module:lang/legacy

I think the question of how we should really handle language-codes (etc.) is incredibly complex, because languages are incredibly complex, and there are a lot of just-slightly-independent dimensions (e.g. WMF language prefix vs. ISO language code vs. HTML language tag); but I don't think we wait until we've hammered that stuff out (or even started hammering it out) before we start taking advantage of Scribunto.

So, how to take advantage of Scribunto, without hammering out the issues surrounding language codes?

One option is to require that language-manipulation be handled in template-space, before invoking Lua; so, for example, Template:context would call {{languagex}} to get the language-name for a given code, and would pass that in to the Scribunto module it uses. The problem with this option — or at least, one problem with this option — is that {{languagex}} is exactly the sort of expensive template that Scribunto is supposed to help us move away from.

Another option is just to create Module:lang now, with the intent of improving it later. The problem with this option is that any real improvements will probably require fundamental changes that will break everything that uses the module.

So instead, I'd like to suggest that we create Module:lang/legacy ("legacy" being a software-engineering term describing an old system that's still in use but does things in ways that are now considered less than ideal), with a more-or-less direct translation of what we've got now. It would then be pretty straightforward to Luacize existing templates without making any breaking changes to them; and then, at some glorious future date when Module:lang is ready, we can slowly modify these templates to take advantage of its luminous beauty.

Are people O.K. with that general approach? If so, I'll set about creating Module:lang/legacy, and will post back here for further feedback before we actually start using it.

—Ruakh_TALK 05:09, 11 March 2013 (UTC)[reply]

Isn't that more or less what Module:languages already does? It is pretty much a direct import of the language code templates, and I haven't made any other changes. —CodeCa t 13:41, 11 March 2013 (UTC)[reply]

Yeah, I noticed that module later. (And I noticed that you hadn't started using it yet, presumably because you wanted to gather input first? If so, I appreciate your caution.) So basically what I'm proposing is (1) that Module:languages be moved to Module:lang/legacy (or Module:languages/legacy if you prefer); (2) that it be changed to match our current structure more precisely (e.g., proto: and so on); and (3) that it be a table of functions (corresponding to existing templates like {{langnamex}}) rather than of raw data. (The raw data could still be exported as p.data or something, but the current approach has the module only include raw data, which is unfortunate.) —Ruakh_TALK 15:12, 11 March 2013 (UTC)[reply]

I did post about it on the BP or GP (I don't remember which). And I haven't started using it because of the speed issues it has, which are discussed on the talk pages. However, the good news is that they've added a new function specifically for this case. It imports data as read-only, but allows it to be shared by all invocations on a page. So while a single use of that module is still somewhat expensive, it would never be imported more than once per page so it is not a problem. I'm not sure what the use would be of your proposal though. I realise that it would be for compatibility reasons, but even then I don't see the purpose of converting it into a table of functions. Also, one of the caveats with the read-only import is that the imported table can't contain functions, only raw data. —CodeCa t 16:50, 11 March 2013 (UTC)[reply]

Re: "I did post about it on the BP or GP (I don't remember which)": I'm almost positive that you didn't. You did post about User:CodeCat/Module:lang, though, which may be what you're thinking of. Re: read-only import: Well then, the data can go in Module:lang/legacy/data. :-) —Ruakh_TALK 02:34, 12 March 2013 (UTC)[reply]

Ok, after thinking about it a bit more I think I understand. You are asking for a kind of "glue" module between old code and the language data. But in the case of {{languagex}} I don't see much of a point. After all, a Lua call like languages_legacy.languagex("fr") would just translate to languages["fr"][1]. There is an alternative though, if you like the idea of wrapper functions around raw data. Lua supports so-called metatables, which are tables that really have accessor functions behind them. Metatables, being functions, can't be included in a read-only module though. —CodeCa t 16:56, 11 March 2013 (UTC)[reply]

But languages_legacy.languagex("gem-pro") would translate to languages["proto:gem-pro"][1], because of the {{langprefix}} ugliness. (I'm quite seriously proposing that we reproduce exactly what we have now, including the stuff that no one likes, because there is still no agreement on how to improve that stuff. What I'm proposing is that we create a clearly-demarcated "legacy" area that allows us to migrate existing templates to Lua without breaking them.) —Ruakh_TALK 02:34, 12 March 2013 (UTC)[reply]

Any comments? —Ruakh_TALK 04:31, 22 March 2013 (UTC)[reply]
It looks eminently wise to me, but I'm obviously so poor in wiseness reserves that I can't be trusted. Seriously, though, when you post about something in the GP and nobody really complains too much, it means that you might as well create it (I mean obviously you should post again before we actually use it, but that's another story). —Μετάknowledge^{discuss/deeds} 15:45, 25 March 2013 (UTC)[reply]
I have an alternative proposal which is somewhat similar. I don't think it's wise to call it "legacy" because we'll never really be able to get rid of it entirely. Certain gadgets and templates rely on being able to subst: language templates. Therefore, I propose that we create an extra module that acts as a glue between wiki-space and module-space. Something like {{subst:en}} would become {{subst:#invoke:languages/invoke|language_name|en}}. Templates like {{languagex}} and {{family}} would then simply contain such an invocation to "forward" the request to Lua. —CodeCa t 15:57, 25 March 2013 (UTC)[reply]

I agree with your proposal, but I see it as complementing mine rather than as an alternative to it. · I agree that we'll probably always have certain language templates (or at least, that we don't currently intend to ever eliminate them all), but firstly, a lot of the details will hopefully change (do we really intend to keep {{langprefix}} forever?), and secondly, the underlying Lua support for them will really hopefully change. What is "legacy" here is the first pass at the Lua implementation: I hope that we will create a better Module:lang within the next year or two. Module:lang/legacy is the stopgap, the temporary glue that lets us migrate safely without sacrificing the long term. —Ruakh_TALK 14:38, 27 March 2013 (UTC)[reply]

I'm not really sure what you're saying, though. In what backwards-incompatible way do you think Module:languages would need to be changed? —CodeCa t 14:59, 27 March 2013 (UTC)[reply]

I don't have very coherent thoughts yet, but our current system has a lot of inflexibility, and it has difficulty dealing with cases like als (which means "Tosk Albanian" when it's an HTML tag but "Alemannisch" when it's a subdomain of wikipedia.org) and Bosnian (which we mostly treat as part of Serbo-Croatian). I think we should consider decoupling some things that our current system wrongly assumes align one-to-one. (Some of these incorrect assumptions, we might decide to keep anyway, as valuable simplifications. But so far we haven't even really examined them very hard, because our system was too inflexible to make them seem conceivable.) —Ruakh_TALK 04:39, 28 March 2013 (UTC)[reply]

I can't think of anything like that, except maybe language families. Do we want languages to be able to belong to more than one family? The issue with Tosk/Alemannic doesn't really seem like an issue, anymore than using "zh" as the subdomain when we have no such code has been an issue so far. I realise that you want to be cautious about it, but I get the feeling there really aren't any serious problems and you're trying to look for problems that don't exist "just in case". —CodeCa t 18:23, 28 March 2013 (UTC)[reply]

I'm confused. I gave two specific examples, and you added another one (zh vs. cmn), so I don't understand what you mean when you say that you "can't think of anything like that". And I agree+disagree with you about Tosk/Alemannic: I agree that it's been exactly as much of an issue as zh vs. cmn, but I conclude that it is an issue rather than that it isn't one. More broadly — I think there are plenty of problems, and I can't imagine that you're blind to them. (You yourself have tried to make proposals that attempt to address some of these issues, and in fact, we've recently had arguments about them!) The only questions are (1) whether those issues can be fixed without breaking things, and (2) if so, whether some sort of migration strategy is needed; and in that respect, yes, I'm being cautious. I think that, thanks to the flexibility of Lua, we'll probably find that some of these problems won't require a migration strategy (e.g., if they can be fixed by adding additional data-points and tacking on or-expressions), and that some of them will. (This is not a matter of "trying to look for problems that don't exist", it's a matter of trying to prepare for solutions to acknowledged problems.) —Ruakh_TALK 19:24, 31 March 2013 (UTC)[reply]

I'm not blind to them, I just don't see how our current approach is so bad. How do we currently handle the zh-vs-cmn problem? We use a template that translates a Wiktionary code to a Wikimedia subdomain. This problem only really surfaces in cases where links need to be automatically generated and there's only a few templates that need that. —CodeCa t 19:57, 31 March 2013 (UTC)[reply]

I'm not saying "our current approach is so bad", I'm just saying it's a bit broken, and should be a bit fixed. ;-) Also, I guess we're avoiding talking about the elephant in the room. Neither of us is happy with {{langprefix}}, so why are you so reticent to view it as legacy code (at least in Lua)? —Ruakh_TALK 20:36, 31 March 2013 (UTC)[reply]

I didn't mention langprefix because to me it was self evident that it wasn't going to be implemented in Lua. —CodeCa t 20:39, 31 March 2013 (UTC)[reply]

Oh, sorry, then I think most of this discussion has been at cross-purposes. To be absolutely clear: My proposal is that we port our current behavior to Lua, but in a module that is clearly marked "legacy". (I actually thought I'd been clear about that — repeatedly — but apparently not.) My rationale for porting our current behavior is as follows: (1) I think we should start taking advantage of Lua ASAP for templates like {{context}}; (2) I think there are some aspects of our current behavior that we can all agree should be changed (e.g. {{langprefix}}); and (3) I think it will take some time to reach consensus on how to change them. My rationale for marking it "legacy" is as follows: (4) to CodeCat "it was self evident that [langprefix] wasn't going to be implemented in Lua" except, of course, in modules marked "legacy". :-) —Ruakh_TALK 20:48, 31 March 2013 (UTC)[reply]

And that is where I ask, again, what the point of such a module is if we're going to change it anyway. Why do all the extra work in porting something we know is legacy? I really don't understand. —CodeCa t 21:04, 31 March 2013 (UTC)[reply]

I have explicitly said what I think the point is. (It's in the sentence that begins, "My rationale for porting our current behavior is as follows".) If you don't understand one of my premises, please say which one. If you disagree with one of my premises, please say which one and why. If you don't agree that my premises lead to my conclusion, please say so. (I mean, I suppose if you just want me to copy-and-paste my previous comment into a new one, I'm willing to do that, but it seems a bit strange.) —Ruakh_TALK 05:07, 1 April 2013 (UTC)[reply]

I still don't understand what you're saying. First, you say "we all agree that langprefix should go" and then you said "let's discuss possible ways to make it go". That makes no sense to me. If it goes, doesn't it just... well, go? Disappear? My confusion is that your statements appear to bring conclusions that contradict themselves. Either it goes or it stays in some form. And if we agree that it should go, what more details do we need to work out? —CodeCa t 13:45, 1 April 2013 (UTC)[reply]

There are currently a whole bunch of entries that use things like lang=gem-pro, but the actual language template-name is {{proto:gem-pro}}. Without langprefix to correct the mapping, all of those entries will break. You have your own view about how this should be fixed: you think that proto:gem-pro should be scrapped in favor of gem-pro. I do not agree with that view. (I don't think this is news.) —Ruakh_TALK 14:04, 1 April 2013 (UTC)[reply]

So instead you filibuster the whole thing by requiring us to add "legacy" modules, which are really only there for your own satisfaction? *sighs* It looks like once again this is an issue that is between you and me, and that nobody else actually finds interesting enough to comment on... —CodeCa t 16:43, 1 April 2013 (UTC)[reply]

WTF?? You are such a hypocrite. You could just as well say that I'm proposing the "legacy" module for your satisfaction, because the alternative that I'd propose is one that you would dislike. I am offering to do all the work to preserve the status quo until a consensus in demonstrated. But no, you think that your own view is somehow magically perfect, and everyone would somehow magically agree with it, if only I would get out of your way. Sheesh, grow up. —Ruakh_TALK 20:37, 1 April 2013 (UTC)[reply]

Du calme, du calme. Ruakh, I happen to (with my highly limited knowledge) agree with you — but calling people hypocrites and the like won't help. (Sorry if that sounds patronising. I just think that having a heated argument over something this minor is only detrimental to Wiktionary.) —Μετάknowledge^{discuss/deeds} 00:58, 2 April 2013 (UTC)[reply]

From my point of view (i.e. fr.wiktionarist), you really should focus on getting rid of hacks like this langprefix thing if you want to be able to use the full power of Lua. You just need a module with language functions (get_name(s), get_script, get_type, get_family) with an associated data module (instead of thousands of inefficiently disseminated data in templates and subtemplates). Dakdada (talk) 17:38, 1 April 2013 (UTC)[reply]

form of template bug

{{feminine of|calmo#Adjective|calmo}}

displays

feminine of calmo

For historical purposes when that gets fixed, it is:

feminine form of calmo#Adjective

This syntax used to work, and I'm not sure why it doesn't. I guess... calmo#Adjective isn't a valid page name. Is my guess right? Mglovesfun (talk) 11:43, 11 March 2013 (UTC)[reply]

Oddly, I think it's actually the code that allows putting raw links into form-of templates that is the cause of this. You're right, it's not a valid page name, and that's what that code goes by to determine whether something is a raw link or just a page name. So it treats its parameter as if it were a raw link, except it's not a link. However, once that code is removed, it should work. On the other hand, the template is missing a language parameter, so that still needs to be fixed. Another thing to consider is that there are probably several #Adjective sections on any given page, so the current approach doesn't actually do what it's intended to do. What you really want is to link to the adjective section of whatever language it is, but I don't think that is currently possible. I think if we have to choose between linking to #Adjective and linking to #language, the latter is preferable. —CodeCa t 13:49, 11 March 2013 (UTC)[reply]

Special:WhatLinksHere/calmo#Adjective seems to be valid, mind you. Mglovesfun (talk) 19:47, 11 March 2013 (UTC)[reply]

Yes, but in a very sneaky way. Notice that when you actually visit the page, it shows links to "calmo" alone. When your browser sees that URL, it actually strips off the # part, so the webserver never sees #Adjective. If you ever actually sent a request for "calmo#Adjective" to the server, it would probably shout at you for providing an invalid request. :) —CodeCa t 20:43, 11 March 2013 (UTC)[reply]

Javascript to tackle 404-errors

I previously posted it in the beer corner, but I figured the grease pit might be more appropriate: I rewrote my example userscript which, upon hitting a 404 error page scans other wiktionaries to see if the word exists there, and if so, displays them as interwiki.
Enable the userscript at User:Stratoprutser/404_native.js and test it out with klompvoet, danim, or real non existing words. -- Stratoprutser (talk) 13:34, 11 March 2013 (UTC)[reply]

No bot owner template?

I miss this template from the English Wikipedia, and it seems hard to introduce here. --Njardarlogar (talk) 17:54, 11 March 2013 (UTC)[reply]

{{bot owner}} should be fine. Mglovesfun (talk) 19:47, 11 March 2013 (UTC)[reply]

Or “I operate NjardarBot (talk • contribs).” No need for a userbox. :-) (If you really want a userbox, by the way, then that's a policy matter, not a technical question, and belongs at BP, not here.) —Ruakh_TALK 02:36, 12 March 2013 (UTC)[reply]

We don't need babel boxes, personal fluency levels could be included the text with more detailed specifications. We don't need user pages either, we could include all that information on Wiktionary:Stasi files.

We should have {{bot owner}} because it standardises and makes more accessible highly relevant user information. --Njardarlogar (talk) 08:43, 12 March 2013 (UTC)[reply]

Etymology trees

A lot of proto- language entries duplicate some of the descendants content. For example, if a Proto-Germanic word is descended from a PIE word, the PG descendants are duplicated in the PIE entry. These often get out of sync, and require many edits to synchronize. Some entries (most?) even just don't duplicate them at all, and require the reader to click the link to find out the further descendants. Couldn't this be fixed by putting the entire tree into a standalone wiki page (maybe in the appendix or template space) and having lua scripts run through the things to pull out the relevant parts? Is this feasible? --Yair rand (talk) 21:31, 12 March 2013 (UTC)[reply]

That could work, but it could turn rather nasty in itself if we have to deal with sub-descendants and sub-sub... For example, part of the Germanic tree would be duplicated on an Old Dutch entry, and part of its tree would in turn go in a Middle Dutch entry. So while it's a good idea, we should be very clear about when it should be applied and when not. Also, another point to consider is that a single "line" in the PIE descendants might have several words in it, each of which might have a separate entry and a list of descendants of its own; see *bʰerǵʰ- for an example. If we go with your approach, those would have to be split into several lines. —CodeCa t 21:40, 12 March 2013 (UTC)[reply]

If we're using Lua I assume we would be able to give it instructions as to which parts of the tree to display (for example, in a Dutch entry you may want to not go back all the way to the PIE root). So I don't see duplication as a problem with this approach.

I don't understand your point about multiple "lines" for one entry- can you restate it? DTLHS (talk) 23:41, 12 March 2013 (UTC)[reply]

At Reconstruction:Proto-Indo-European/bʰerǵʰ-, the Germanic line lists two separate Proto-Germanic forms. Both of these forms are derived from the same PIE etymon, but they're separate forms, and have separate descendants. So the descendants of the PIE etymon form a tree in multiple dimensions: not necessarily just one branch per daughter language. —Ruakh_TALK 06:58, 13 March 2013 (UTC)[reply]

404 errors?

I keep getting this error randomly when I visit pages:

Not Found
The requested URL /w/index.php was not found on this server.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

Is anyone else getting that too? It's very annoying... —CodeCa t 00:23, 13 March 2013 (UTC)[reply]

Me too. DCDuring TALK 00:51, 13 March 2013 (UTC)[reply]

On 'pedia too. Which is a good thing, because that means somebody will actually care and if it's a fixable problem, it'll be fixed soon. —Μετάknowledge^{discuss/deeds} 01:02, 13 March 2013 (UTC)[reply]

Genitive of proper nouns

The template {{genitive of}} puts words into the appropriate "... noun forms" category. Is that appropriate for proper nouns? See (deprecated template usage) Kleinasiens as an example. SemperBlotto (talk) 11:35, 13 March 2013 (UTC)[reply]

pos=proper noun. Mglovesfun (talk) 11:42, 13 March 2013 (UTC)[reply]

OK - that puts it into both cats (presumably intentionally). SemperBlotto (talk) 11:46, 13 March 2013 (UTC)[reply]

I would prefer the forms of proper nouns to be in the normal "noun forms" category. There is already some disagreement on whether proper nouns are as distinct from nouns as we consider them to be, and making that same distinction in forms is a bit overboard. I can't really think of a good reason why someone would want to look up a list of proper noun forms specifically. —CodeCa t 14:56, 13 March 2013 (UTC)[reply]

Yes, I tend to agree with you. Actually, I have often wondered if our users ever use any of our massive range of categories at all - does anyone have any evidence that they do? I think that their main use is for editors to see what words we have, and especially what similar words may be missing. SemperBlotto (talk) 15:58, 13 March 2013 (UTC)[reply]

I usually treat the form-of categories as kind of a "because it has to be in at least one category" thing. So I don't usually make any further subdivisions. —CodeCa t 16:01, 13 March 2013 (UTC)[reply]

Meetup & videostream tomorrow - focus on Lua

Tomorrow's meetup at Wikimedia Foundation headquarters in San Francisco focuses on how Lua as a templating/scripting language improves our sites, and includes a brief introduction to Lua. It'll also be streamed live on the web, and the video will be posted afterwards. Please feel free to visit or watch! Sumana Harihareswara, Wikimedia Foundation Engineering Community Manager (talk) 15:46, 13 March 2013 (UTC)[reply]

conjugation template for German reflexive verbs

Have we got a conjugation table template for German verbs that are reflexive? What about for verbs that are both reflexive and separable, e.g. (deprecated template usage) fremdschämen (which can also be inseparable, and so really needs two tables), which conjugates like "ich schäme mich fremd" (and "ich fremdschäme mich")? - -sche (discuss) 21:56, 13 March 2013 (UTC)[reply]

I have always preferred not to have separate entries for reflexive verbs if they are formed using separate words or clitics in a language. That especially applies to languages like Dutch or German where the word order may be vastly different. So different, in fact, that any entries we create for inflected forms will be almost useless. Just consider in how many different ways the reflexive pronoun may be arranged in a few typical German sentences. Add a separable verb into the mix and it becomes even worse. For that reason, I prefer to redirect reflexive verbs to their non-reflexive entries, and add {{reflexive}} to the specific senses. I have already done this for Dutch. —CodeCa t 01:03, 14 March 2013 (UTC)[reply]

Alright, but that doesn't answer my question. de.Wikt doesn't have entries for e.g. de:sich fremdschämen, de:sich benehmen, etc, but the tables in de:fremdschämen, de:benehmen, etc include "sich". Do we have tables that do likewise yet? If not, I can set about creating some (though I might need help). - -sche (discuss) 02:03, 14 March 2013 (UTC)[reply]

What I am saying is that we probably shouldn't have such tables. Consider a verb like irren, which has some reflexive and some non-reflexive senses. Should that entry have two conjugation tables, both containing the exact same conjugated verb forms, but one with the reflexive pronoun and one without? I don't think it should. —CodeCa t 02:15, 14 March 2013 (UTC)[reply]

A standard location for Lua transliterations

One of the obvious advantages of Lua is the ability to automatically transliterate words into Latin script. It is definitely something we'd want to add to templates like {{l}}, {{term}}, {{head}} and {{t}}. However, for that to work, there has to be a single common scheme for the functions that do the transliteration. The problem is that every language could have its own transliteration scheme, so just putting them all into one module will eventually run into speed issues because that module would eventually become too large. Therefore I propose that we form a single common scheme, an "interface" so to say, that transliteration functions have to adhere to so that they are interoperable with one another. Compare it to the way all of our script templates work the same way and are therefore interchangeable with one another. Is there a way we can do this for transliterations too? —CodeCa t 00:58, 14 March 2013 (UTC)[reply]

I think invoking them from Module:foo-translit is the most logical location, if that's what you mean by "scheme", but I don't really mind if people would rather have it at Module:foo-common, invoked as tr. I'd like to go on record that transliteration modules should be language-based, not script-based, to reduce the complexity (and sheer size) of individual modules. —Μετάknowledge^{discuss/deeds} 01:03, 14 March 2013 (UTC)[reply]

I know, and that is kind of what I had in mind. However, if we do it for every language, how do we handle cases where there is no transliteration module for a language yet? Is Scribunto capable of handling a failed module import gracefully? —CodeCa t 01:05, 14 March 2013 (UTC)[reply]

No idea. But first of all, which location do you prefer? I ask because I'm planning on creating a bunch of these soon. —Μετάknowledge^{discuss/deeds} 01:52, 14 March 2013 (UTC)[reply]

I would prefer keeping it separate, in Module:foo-transliteration. But I just thought of something else we could try. As far as I know, transliteration isn't context-dependent: the same letter always becomes the same Latin letter(s) regardless of how it appears in the word. That means we may not even need whole functions to do this; we could just store a list of letter-pairs. And since that would consist of only data, it would be possible to add it to Module:languages (which may not contain any functions). —CodeCa t 01:58, 14 March 2013 (UTC)[reply]

That's a really bad idea IMO. For one thing, the premise is wrong (example: Korean) and for another Module:languages is already too big for me to even load it in a reasonable amount of time, last I checked, let alone edit it. —Μετάknowledge^{discuss/deeds} 02:03, 14 March 2013 (UTC)[reply]

The time for you to load it is a lot longer than the time Lua takes to load it. A recent Scribunto update actually added a function specifically for loading such large modules containing data. So the size is really not a problem, at least not if we are to believe the developers. As for the premise... when does it not apply to Korean? I had the impression that Korean was actually rather regular. Can you give an example of a single Korean letter or syllabic that can be transliterated in several different ways? Also, just to make it clear, this idea isn't meant to be able to transliterate every language, it would be hopeless to attempt it for the likes of Han characters. —CodeCa t 02:10, 14 March 2013 (UTC)[reply]

I know that... but there's still the problem of me wanting to edit it! If it gets too big, it's a real problem for editors. Anyway, my point with Korean is that if you just take the letters ㅇ, ㅗ, and ㄱ, if you combine them in one order you get 공 (gong) but in the opposite order you get 옥 (ok). Can Lua handle that? —Μετάknowledge^{discuss/deeds} 02:14, 14 March 2013 (UTC)[reply]

If our browsers can tell the difference, why can't Lua? From what I can tell in w:Korean language and computers, Hangul is encoded by combining all three individual letters into a single character. I presume that means that from a transliteration perspective, Hangul behaves as a syllabary and "gong" and "ok" are two different characters, each with a single Unicode codepoint, like Chinese characters or Kana are. —CodeCa t 02:21, 14 March 2013 (UTC)[reply]

Oh and just to clarify, exceptions like Japanese "ha" being pronounced as "wa" can simply be explicitly overridden with a tr= parameter like we have now. Automatic transliteration is simply meant to provide a useful default transliteration, but it should be possible to override it when it's wrong, just like we could override an irregular plural form. —CodeCa t 02:25, 14 March 2013 (UTC)[reply]

Sometimes Unicode is really weird... does that mean that Module:ko-translit will be gigantic? (Yes, I'd rather foo-translit over foo-transliteration, so I will be using that as the standard now unless you have a good reason not to do so.) I agree on the exceptions, although for languages like Kyrgyz where there don't appear to be any exceptions, overrides aren't necessary. —Μετάknowledge^{discuss/deeds} 02:29, 14 March 2013 (UTC)[reply]

For Korean you need a formula to decompose hangeul chracters into individual jamo. I have written a transliteration tool a while ago in C#. I've got it somewhere at home, happy to share if someone want to write transliteration tool. I wonder what Google translate uses to transliterate Mandarin and Japanese (often wrong, especially Japanese!). --Anatoli ^{(обсудить}/^вклад) 02:39, 14 March 2013 (UTC)[reply]

It could become pretty large, yes. Which is kind of unfortunate considering that Hangul itself is so well-structured. Hangul could be easily transliterated if we could piece apart individual code points like Anatoli said, but that would require more than we can put into a simple data table like Module:languages. On the other hand, the module we will presumably create to handle automatic transliteration, Module:transliteration, could simply be hand-coded with an exception specific to Hangul. The function could work like this: if the script is Hangul, then do some fancy code-point processing in Unicode, else use the pair-wise table. —CodeCa t 02:53, 14 March 2013 (UTC)[reply]

The logic for decomposing is simple, a JavaScript can handle this. I will get my code when I have a chance and post a logic somewhere. The complete program had some flaws as it didn't take into account some consonant changes, which should be reflected as per Revised romanisation. --Anatoli ^{(обсудить}/^вклад) 03:07, 14 March 2013 (UTC)[reply]

I don't understand exactly what's going but it sounds interesting.

For languages without manageable automatic transliteration this module should be skipped but it would be useful if people could add missing sounds or correct them, e.g. if Arabic "ظهر" were automatically transliterated as "ẓhr", an editor would edit to make it "ẓuhr" (insert the unwritten vowel). --Anatoli ^{(обсудить}/^вклад) 02:05, 14 March 2013 (UTC)[reply]

Well, for languages like Arabic, automatic transliteration wouldn't be terribly helpful if we use the basic page name for it. But I think that transliterating the fully vowel-marked version of the word could work? We already add vowels to the head= parameter, so the template/module could be written to use this instead of the page name. —CodeCa t 02:10, 14 March 2013 (UTC)[reply]

I meant if Lua could be used in editing or adding translations, not in ready entries (e.g. in preview). Fully vowelled Arabic (not sure about Hebrew) could be transliterated but not sure if this could be made perfect (without errors), perhaps it can, if strict spelling rules are followed (eg. hamza is written when it's appropriate and ى and ه (h) are not used instead of ي (y) and ة. --Anatoli ^{(обсудить}/^вклад) 02:27, 14 March 2013 (UTC)[reply]

Hebrew is impossible because WT:HE TR requires marking vowel stress, which Hebrew doesn't do. Yes, Arabic would require strict spelling rules to be followed which translations currently do not (but I think entries usually do). —Μετάknowledge^{discuss/deeds} 02:42, 14 March 2013 (UTC)[reply]

I suggest to use it only if transliteration is missing, specific transliteration should override Lua. We have SO many translations and entries with no translit. Lua transliteration may have some warning advising people that it can be incorrect (for selected languages?) Also note my reply re: Korean above. I can spend some with whoever works on the Korean transliteration. --Anatoli ^{(обсудить}/^вклад) 02:49, 14 March 2013 (UTC)[reply]

Well, at least for Greek and Cyrillic, and generally any fully alphabetic script, the transliteration could be made flawless (but it would not include stress marks). I don't see lack of stress marks as a reason to avoid automatic transliteration altogether. A transliteration without them may not be complete, but it won't be wrong either, so it may be usable for Hebrew too. Devanagari and the other Indic scripts are encoded as alphabets in Unicode (the consonants and vowels are separate), but they need special treatment because the transliteration of the consonants depends on whether a vowel character follows ("devanāgarī" is actually encoded as "d-e-v-n-ā-g-r-ī"), so a simple pair-table would need to be supplemented by a function that suppresses the inherent vowel of a consonant when necessary. Such a function could, however, probably work for all indic scripts as long as we tell it which letters in a given script are consonants and which are vowels. —CodeCa t 02:53, 14 March 2013 (UTC)[reply]

Actually, Cyrillic and Greek can be flawless even with stress marks :) See Module:ru-translit. —Μετάknowledge^{discuss/deeds} 02:57, 14 March 2013 (UTC)[reply]

That is a lot of code. Somehow I think that it could be a lot simpler, but I don't really know what a lot of it does (specifically, what purpose does it serve, why is it there?). How much of it is actually specific to Cyrillic? —CodeCa t 03:00, 14 March 2013 (UTC)[reply]

The scheme used for Russian is a mix of transliteration with arbitrary exceptions where parts of words are phonemically transcribed. Not a good exemplar. —Michael Z. 2013-03-14 04:41 z

The developer, also a Russian, did what he felt was right for the Russian language and what is our policy. It has described exceptions. E.g. adjective endings -ого/-его are transliterated as -ovo/-(j)evo, not -ogo/-(j)ego, it's standard. The code needs to cater for these where possible. --Anatoli ^{(обсудить}/^вклад) 04:49, 14 March 2013 (UTC)[reply]

Yes, but it is quite different from any other transliteration scheme, and is far from a typical example or prototype for any transliteration code. —Michael Z. 2013-03-14 05:34 z

Not so sure about "any other". The Japanese particles は and へ, for example, are transliterated phonemically as "wa" and "e", not as their usual hiragana readings "ha" and "he". Catering for these exceptions may be big hurdles in some cases. As you yourself mentioned below (transliterating letters depending on their position), CodeCat mentioned about Indic languages, will make automatic transliteration harder and will require more sophisticated code. Russian may turn out an easy example. --Anatoli ^{(обсудить}/^вклад) 05:44, 14 March 2013 (UTC)[reply]

Yes, romanizing logographic and syllabic scripts is more complicated than for the Cyrillic alphabet. Usually. —Michael Z. 2013-03-14 05:58 z

I don't think automated transliteration should cater to such exceptions. If the default is wrong, it should be overridden just like we do with any other template that generates a default form (such as {{en-noun}}'s plural). See my comments further down. —CodeCa t 14:09, 14 March 2013 (UTC)[reply]

(In response to Metaknowledge 02:42, 14 March 2013 (UTC).) In Hebrew, not only stress is the problem. חָכְמָה, for example, can be chochmá or chach'má (two different words). Basically, even if we wouldn't mark stress, any word with U+05B0 HEBREW POINT SHEVA and many with U+05B8 HEBREW POINT QAMATS would be ambiguous — and that's a large proportion of all Hebrew words. (That's just re general problems of automating Hebrew transliteration. I haven't been following this discussion at all, and don't understand, e.g., its first post.)—msh210℠ (talk) 06:37, 14 March 2013 (UTC)[reply]

For Russian there is currently no single entry missing transliteration, translations miss it sometimes, which I have been fixing. Still see some use for Russian in the future. Agreed about Indic. Uyghur is fully vowelised, even if it's Arabic abjad based. Armenian, Georgian seem easy. Thai, Khmer, Lao, Burmese would be complex but possible. Khmer might be the easiest, check with Stephen G. Brown.--Anatoli ^{(обсудить}/^вклад) 03:07, 14 March 2013 (UTC)[reply]

If romanization is automated, then there could be multiple schemes per language. Perhaps the displayed scheme for a language could be a user pref, or an entry could show several commonly-used romanizations. It would be reasonable to add a BGN/PCGN transliteration for a geographic name, for example, as that’s what would be seen on many maps. In the future we could decide to include other transliterations, e.g., Cyrillizations of Latin or Chinese script. The framework should leave room for this. We might also use LOC transliteration for foreign-language titles in citations, as this is used in library catalogues and bibliographies.

Romanizations aren’t necessarily straight table lookups. Some important ones include exceptions for occurrences at the beginning or end of a word, or after a vowel or consonant. But we could start by implementing ones that are straight lookups.

ISO language tags have a standard representation for transformed text, although the tags can get lengthy. This might be useful for cataloguing our schemes. —Michael Z. 2013-03-14 04:43 z

While that would be nice, I think it kind of misses the point. The point of automated transliteration, to me, is that it can provide a sensible default where it has not yet been provided. So, for example, if someone types {{head|ru|noun}} on an entry without a transliteration, automated transliteration could make one itself. But it would still be necessary and desirable to check it and to override it if necessary, so it's not meant as a substitute for manual transliterations. —CodeCa t 14:04, 14 March 2013 (UTC)[reply]

I disagree. Your position makes sense for some languages, like Russian. But we've had a real problem with some smaller languages, like Telugu, where the contributors use various transliteration schemes, ranging from ASCII to ad hoc, and often neglect transliteration altogether. The only person cleaning that mess up has been Stephen G. Brown. An automated transliteration system for Telugu will be more reliable than what users give as a transliteration value, and thus it definitely would be a substitute for manual transliteration. —Μετάknowledge^{discuss/deeds} 22:51, 14 March 2013 (UTC)[reply]

I agree with CodeCate on this one. Humans know or should know better. We should have the ability, perhaps, to give us both manual and automated, then, someone with knowledge of standards could fix non-standard transliteration. So, to give a Russian example, no point in having automated transliteration of "что" as "čto" (incorrect), I'd prefer manual "shto" (non-standard but correctly showing non-standard reading), then I know that I have to put "što" to make it standard. I stumbled across similar problems with Bengali and Thai. There are many exceptions in readings in various languages. I already mentioned Japanese particles particles は and へ. --Anatoli ^{(обсудить}/^вклад) 23:24, 14 March 2013 (UTC)[reply]

But that only makes sense if exceptions exist; if we don't know, then we should ask our local experts like Stephen or just bring the issue up at language forums. In many languages, like Greek, our system specifically says that it is trying to reproduce the orthographical conventions rather than the sounds, so there will never be exceptions.—Μετάknowledge^{discuss/deeds} 23:31, 14 March 2013 (UTC)[reply]

We can have exception-free languages, like e.g. most Cyrillic-based, except for Russian. Don't know Telugu but in Hindi (like Arabic) there is strict and relaxed spelling, ग़रीब (ġarīb) can be spelled (casually but too common) "गरीब (garīb)" (without a "nuqta", a dot under ग (ġa) -> ग़ (ga)). It's still "ġarīb", not "garīb". Manual transliteration should provide the correct pronunciation, as nuqta is often ignored in Hindi (8 Devanagari letters can use it).

I personally disagree with WT:EL TR, they totally ignore the way foreign words with "b" and "d" are transliterated "μπ" /b/, "ντ" /d/, etc. --Anatoli ^{(обсудить}/^вклад) 00:19, 15 March 2013 (UTC)[reply]

no point in having automated transliteration of "что" as "čto" (incorrect)

Anatoli, there’s exactly a point of having a transliteration of the word spelled č-t-o be čto. Otherwise, it is not a transliteration. Why are you putting pronunciation into the place for transliteration? —Michael Z. 2013-03-15 01:05 z

I agree with Michael on that point. Consider what would happen if we started "transliterating" English the same way. We'd end up with something resembling enPR wouldn't we? —CodeCa t 01:15, 15 March 2013 (UTC)[reply]

That's what Lua will produce - "čto". I want to override it with "što" because "čto" is misleading and doesn't help anybody, foreigners and even some uneducated Russians still read out "что" as "čto" when it should be "što" it is a practice accepted and used over years by editors working with Russian. This exception is not predictable like akanye and the knowledge of Russian phonology and sound changes doesn't help to arrive at the correct pronuncation of the word, so it has to be specifically explained. IPA is not sufficient, many people dislike or don't understand it and IPA is not used in translations. --Anatoli ^{(обсудить}/^вклад) 01:21, 15 March 2013 (UTC)[reply]

I think you need to consider what transliterations are for. The purpose is to allow someone to read the word when they don't know the script. It's not meant to tell them how to pronounce the word, that's what the pronunciation section is for. Moreover, if someone is able to read Cyrillic, they shouldn't need the transliteration, should they? So if you think about it, someone who doesn't need the transliteration will end up reading the Cyrillic letters что (čto) while someone who does need it will find što instead. That is just inconsistent, and it's rather strange that the transliteration (which is meant as a reading aid) gives different information. I think the transliteration should only have information in it that can also be deduced from the original script (in combination with the regular phonology/orthography of the language). If you really want to show that что is to be read as što, then that should be written in addition to the regular transliteration čto, not replacing it. —CodeCa t 01:32, 15 March 2013 (UTC)[reply]

I think the present situation is fine, but if you want to have this conversation, move it to the BP. —Μετάknowledge^{discuss/deeds} 01:38, 15 March 2013 (UTC)[reply]

MK, what is relevant here is that transliteration schemes that meet the criteria for transliteration schemes also tend to be suitable for mechanical transliteration (whether it be by machine or by a non-native reader). Substituting a complex, proprietary, ambiguously-defined, phonemic transcription system is a loss for readers, editors, and openness, as well as for automated transliteration. We should define some baseline standards for transliteration. —Michael Z. 2013-03-15 14:42 z

Anatoli, one objective of having both transliteration and pronunciation is exactly to show when the two differ. By not transliterating the word (which requires a respect for its letters), you are obscuring that very information, potentially contributing to the problem you describe. If accessibility of the pronunciation is lacking, then improve the pronunciation, as you have done in some entries, instead of destroying the transliteration. —Michael Z. 2013-03-15 14:42 z

Think about what would happen if someone who just started learning Cyrillic comes across что. They have just learned that ч is č or ch or some variety. Yet here they suddenly see what they think is a "wrong" transliteration, so they will correct it. That's what I'd probably do too if I found this. —CodeCa t 15:05, 15 March 2013 (UTC)[reply]

Relevant: #Criteria for romanization systems —Michael Z. 2013-03-15 20:41 z

People who start learning Japanese and learn hiragana, see the phrase これはなんですか。. I they only know hiragana they will read "kore ha nan desu ka?". An automatic transliterator would also romanise it so. You need a person knowing Japanese to correct and say it's "kore wa nan desu ka?". This is how Japanese is transliterated. There's no difference with the Russian "что это?", which is "što éto?", not "čto éto?". It's not just understanding the writing system. Transliterating letter by letter ("čto eto") is just unhelpful in this case. Foreign users can ask if they think it's "wrong", native users understand exactly why it's transliterated that way. People who know Cyrillic but don't know exception will misread the word. I'm OK with Lua to transliterate the default way (taking into account some basic rules of changing, like поезд (pojezd) (pójezd) but небо (nebo) (nébo) "е" = je/e, берёза (berjóza) (berjóza) but жёлтый (žóltyj) (žóltyj) "ё" = jo/o) but it's up to editors with the knowledge to override the default and correct.

If anyone wants to check the complicated rules about Korean standard transliteration, read w:Revised Romanization of Korean, there are too many consonant changes, like ㅂ (b) + ㄴ (n) (b + n = mn), ㄹ (l) + ㄴ (n) (l + n = ll, nn). @Michael, please just stop talking about "destruction" of the transliteration. --Anatoli ^{(обсудить}/^вклад) 10:42, 16 March 2013 (UTC)[reply]

Sorry, Anatoli, but you’re still, apparently willfully, missing the point of transliterating Russian Cyrillic when you talk about “correcting” it into something that fails most objective criteria for transliteration. Russian is not Japanese or Korean. Also, a small number of editors are lording it over your own little empire and disregarding the needs of the readers when you create a proprietary “system” full of vagaries, refuse to clarify them, and insist that everyone else doesn’t understand it well enough to have a valid opinion about it. —Michael Z. 2013-03-23 17:39 z

An automated transliteration of Burmese would generate the ALA-LC system very easily, and could probably be made to generate the MLCTS as well, but four years ago Stephen and I reached the compromise that Burmese entries would show four romanization systems (two that are orthography-faithful transliterations and two that are pronunciation-faithful transcriptions), while Burmese words mentioned on other pages (e.g. in Etymology and Translations sections) would just use the pronunciation-faithful BGN/PCGN transcription. —An gr 11:02, 16 March 2013 (UTC)[reply]

For multiple transliteration methods, we could use multiple modules named in the form:

{{ my-translit| မြန်မာဘာသာ}} – default romanization, e.g. BGN
{{ my-alaloc-translit| မြန်မာဘာသာ}} – other romanization, e.g. ALA-LC

Or a single module with a transliteration, method, or scheme argument for the non-default methods:

{{ my-translit| မြန်မာဘာသာ}}
{{ my-translit| မြန်မာဘာသာ| method=alaloc}}

Standard tags for ISO t extension are

alaloc – American Library Association-Library of Congress
bin – US Board on Geographic Names
buckwalt – Buckwalter Arabic transliteration system
din – Deutsches Institut für Normung
host – Euro-Asian Council for Standardization, Metrology and Certification
iso – International Organization for Standardization
mcst – Korean Ministry of Culture, Sports and Tourism
stats – Standard Arabic Technical Transliteration System
ungegn – United Nations Group of Experts on Geographical Names

Specific versions are typically tagged like ungegn-2012. Non-standard methods would be tagged with an “x” private-use code, e.g., x-wikt. —Michael Z. 2013-03-17 20:54 z

But the problem is that only two of the systems in use here are predictable from the spelling—the other two (including the one used outside the Burmese pages themselves) are not (always) predictable from the spelling. Though I suppose the BGN/PCGN transcription is predictable often enough that it will be OK as long as it's possible to manually override the automatic transcription, e.g. via

{{my-translit|မြန်မာစကား
|tr=myanmazăga:}}

to prevent the template from automatically generating myanmasăka:. —An gr 10:20, 19 March 2013 (UTC)[reply]

Does that correspond to note 1 on page 3 of this standard, or lines 1 and 4 of the first table in this one? It looks like it might be predictable, but requiring some more-complicated programming. A manual override like you describe sounds like a good compromise, until and if that programming can be added. —Michael Z. 2013-03-22 19:04 z

dump grep request: Hebrew section SGML comments

Can someone please generate a list of pages, each of which has a ==Hebrew== section containing <!?—msh210℠ (talk) 06:52, 14 March 2013 (UTC)[reply]

את . שוקולד . אגרוף . אגרוף תאילנדי . אינדונזי . אנגלית . מים . טוב . אלוהים . בן . דרום . היה . גדל . מצא . פן . עם . ילד . עור . אהרן . ויקרא . אחת . לבן . תוכי . כי . ז־כ־ר . כדור . אח . מת . אי . שם . מספר . נזהר . ישן . כוס . תת . ציבור . זה . יותר . ירדן . טבע . איזה . צילם . מתמטיקה . הפעיל . האיר . אָ . הבא . פקד . בער . ־ון . בא . קשת . קורס . נשא . שלח . עאכ״ו . פחות . באשר . שימוש . כפה . אלהים . צפון . ־ים . הבין . סדין . נפלא . מאפיה . התקלח . פסח . י־ל־ד . מעות חטים . ־ה . זיין . קעקע . גת . יום טוב . הזיע . י״ט . הארץ . הטהר . הצטנן . השתדל . השתכנע . הסתעף . השתמר . השתכר . התלכלך . התקמט . התחבא . התלבש . התקשר . השתתף . התנכר . תיכנת . ארצה . רעש . הביא . מלח לימון . חומצת לימון . חשמן . חרש . כרית . מחמד . ארוחת עשר . בנים . נ־כ־ר . ילדים . לבד . גילה . מרדכי . גלעד . פרו . ישרצו . כהה . מלכי־צדק . זכור . תמלא . יאמר . עמו . הבה . נתחכמה . ירבה . בנו . ישימו . משוגע . תיראן . מצה . תחיין . יראו . תחיון . לכי . תכה . שמך . רעך . להרגני . נודע . אסרה . אראה . כדי . להעלתו . העלה . בני ישראל . ואמרו . אלי . מכרה . ושמעו . נלכה . שלשת . ושלחתי . הכה . עיני . תשליך . ידו . והיה . ולקחת . פיך . והוריתיך . אוצר . שוק שחור . אשובה . מערכת הפעלה . מעבר לים . אינדונזיה . מג״ב . כוח . הקב״ה . האט . חומוס . קטון . ניגש . ויאמן . וישמעו . ענים —Ruakh_TALK 03:37, 15 March 2013 (UTC)[reply]

Many thanks.—msh210℠ (talk) 15:37, 15 March 2013 (UTC)[reply]

Edittools?

Anyone else having trouble with Edittools in Chrome? Using Chrome on Win 7. Edittools were working fine this morning, but I get back from lunch and they completely fail to load, not even the default ones... -- Eiríkr Útlendi │ Tala við mig 20:23, 14 March 2013 (UTC)[reply]

Downloadtools

Is there any kind of API, or database, where I can download some ogg files from wiktionary???

Best regards --77.47.30.210 21:26, 14 March 2013 (UTC)[reply]

A function to convert Korean hangeul to Roman letters (basic) in C#

This is the code I promised to share for converting Korean hangeul to Roman letters. The code breaks up hangeul blocks into jamo components, e.g. 한 (han) = ㅎ (h), ㅏ (a), and ㄴ (n).

I can give the full code in C# as well for the graphical program (includes Cyrillisation of Korean). Just need a C# compiler (csc.exe)

The code also handles ㄹ (l) (l/r) but doesn't cover all cases.

		private string romanize(string stringToConvert)
		{
			string result = "";

			string [] rLeads = {"g", "gg", "n", "d", "dd", "r", "m", "b", "bb", "s", "ss", "", "j", "jj", "ch", "k", "t", "p", "h"};
			string [] rVowels = {"a", "ae", "ya", "yae", "eo", "e", "yeo", "ye", "o", "oa", "oae", "oi", "yo", "u", "ueo", "ue", "ui", "yu", "eu", "eui", "i"};
			string [] rTails = {"g", "gg", "gs", "n", "nj", "nh", "d", "l", "lg", "lm", "lb", "ls", "lt", "lp", "lh", "m", "b", "bs", "s", "ss", "ng", "j", "c", "k", "t", "p", "h"};
			char currentChar;
			int index = 0;
			string l = "";
			string v = "";
			string t = "";
			int charInt = 0;
			string syllable = "";
			bool wasVowel = false;

			for (int i = 0; i < stringToConvert.Length; i++)
			{
				currentChar = stringToConvert[index];
				
				if (((int)currentChar >= 44032) && ((int)currentChar <= 55203))
				{
					charInt = (int)currentChar;
					try
					{
						l = rLeads[((charInt - 44032) / 588)];
						//convert R to L if after a consonant
						if	((l == "r") && (!wasVowel))
							l = "l";
					}
					catch (IndexOutOfRangeException ex)
					{
						l = "";
					}

					try
					{
						t = rTails[((charInt - 44032) % 28) - 1];
					}
					catch (IndexOutOfRangeException ex)
					{
						t = "";
					}
					
					try
					{
						v = rVowels[((charInt - 44032 - (charInt - 44032) % 28) % 588) / 28];
					}
					catch (IndexOutOfRangeException ex)
					{
						v = "";
					}
					
					syllable = l + v + t;
					if ((syllable.Substring(syllable.Length -1, 1) == "a")||
						(syllable.Substring(syllable.Length - 1, 1) == "e")||
						(syllable.Substring(syllable.Length - 1, 1) == "i")||
						(syllable.Substring(syllable.Length - 1, 1) == "o")||
						(syllable.Substring(syllable.Length - 1, 1) == "u"))
					{
						wasVowel = true;
					}
					else
					{
						wasVowel = false;
					}

					if (useSyllableDelimiter)
						result = result + syllable + "-";
					else
						result = result + syllable;
				}
				else
				{
					//trim dashes if the next character wasn't Korean
					if ((result.Length > 1) && (result.Substring(result.Length - 1, 1) == "-"))
						result = result.Substring(0, result.Length - 1) + currentChar;
					else
                        result = result + currentChar;
				}
				index++;
			}

			if (keepOriginal)
				return stringToConvert + "\n" + result;
			else
				return result;
		}

Hopefully someone gets interested in making a transliteration tool for Korean. The above code is basic, it converts the Google Translate way - well, almost, the finals are "k", "p" and "t", not "g", "b" and "d", which is more standard. It doesn't take into account the changes required by Revised romanisation (current standard in South Korea) but if you're able to start, then I'll help to get the rules, which are not too complex. --Anatoli ^{(обсудить}/^вклад) 04:35, 15 March 2013 (UTC)[reply]

Example conversion of a text from Korean Wikipedia:

Source:

한국어(韓國語)는 주로 한반도에서 쓰이는 언어로, 대한민국에서는 한국어, 한국말이라고 부른다. 조선민주주의인민공화국에서는 조선어(朝鮮語), 중국(조선족 위주)에서도 조선어(朝鮮語)로 불린다. 카자흐스탄 등 구 소련의 고려인들 사이에서는 고려말(高麗말)로 불린다.

19세기 이후 한반도와 주변 국가의 정치 사회상 변화에 따라 중국(특히 옌볜 조선족 자치주), 일본, 러시아(특히 연해주와 사할린), 우즈베키스탄, 카자흐스탄, 미국, 캐나다 등에 한민족(韓民族)이 이주하면서 이들 지역에서도 한국어가 쓰이고 있다. 한국어 사용 인구는 전 세계를 통틀어 약 8천200만 명으로 추산된다.[1] 일제 강점기에는 일본 제국의 문화 말살 정책으로 상당한 핍박을 받았다.

Converted text (needs tweaking, I know):

hangugeo(韓國語)neun juro hanbandoeseo sseuineun eoneoro, daehanmingugeseoneun hangugeo, hangugmalirago bureunda. joseonminjujueuiinmingonghoagugeseoneun joseoneo(朝鮮語), junggug(joseonjog uiju)eseodo joseoneo(朝鮮語)ro bullinda. kajaheuseutan deung gu soryeoneui goryeoindeul saieseoneun goryeomal(高麗mal)lo bullinda.

19segi ihu hanbandooa jubyeon guggaeui jeongchi sahoisang byeonhoae ddara junggug(teughi yenbyen joseonjog jachiju), ilbon, leosia(teughi yeonhaejuoa sahallin), ujeubekiseutan, kajaheuseutan, migug, kaenada deunge hanminjog(韓民族)i ijuhamyeonseo ideul jiyeogeseodo hangugeoga sseuigo issda. hangugeo sayong inguneun jeon segyereul tongteuleo yag 8cheon200man myeongeuro chusandoinda.[1] ilje gangjeomgieneun ilbon jegugeui munhoa malsal jeongchaegeuro sangdanghan pibbageul badassda.

--Anatoli ^{(обсудить}/^вклад) 04:46, 15 March 2013 (UTC)[reply]

I have Luacized that function, cleaned it up slightly (IMHO; YMMV), and put it at Module:ko-utilities. —Ruakh_TALK 03:11, 17 March 2013 (UTC)[reply]

@Anatoli: Which cases doesn't it cover?

@Ruakh: I'm going to put that at Module:ko-translit with the function being named rv (to match Korean template parameters). Just thought I'd let you know; if there's a problem with me doing that you can move it back. —Μετάknowledge^{discuss/deeds} 03:29, 17 March 2013 (UTC)[reply]

Can you give it a longer name? "rv" doesn't really mean much. —CodeCa t 03:31, 17 March 2013 (UTC)[reply]

Decided not to change the function's name for now. The reason for rv is that there are multiple transliteration systems for Korean. Wiktionary primarily uses Revised Romanization, but entries often use {{ko-pron}} to show three more methods, one of which cannot be reliably deduced from the hangeul alone (nor can the IPA, for that matter). We should Luacize all possible methods used on Wiktionary. —Μετάknowledge^{discuss/deeds} 03:36, 17 March 2013 (UTC)[reply]

I was actually hoping for something like "revised_romanization" or maybe shorter "revised_rom" if you want. —CodeCa t 03:40, 17 March 2013 (UTC)[reply]

Ruakh, thanks for the efforts but do you have a working version so far? (the current module was renamed to Module:ko-translit, which requires Module:ko-hangul I tried to call but it didn't work. Not sure if you're in the middle of development.

@Metaknowledge, before we can starting tweaking for details, need to get the basic functionality to work. --Anatoli ^{(обсудить}/^вклад) 11:13, 17 March 2013 (UTC)[reply]

It works just fine, you just don't understand how to use Scribunto modules. Please read Wiktionary:Scribunto. —Ruakh_TALK 16:13, 17 March 2013 (UTC)[reply]

Lua loops?

Does anyone know what happens if you put a never-ending loop into a Lua module? Does it stop the entire wiki? SemperBlotto (talk) 18:23, 16 March 2013 (UTC)[reply]

It wouldn't stop everything as far as I know, there is a time limit. Why not try it? —CodeCa t 18:30, 16 March 2013 (UTC)[reply]

I somehow doubt that the servers would give exclusive access to one process from one instance of one page, let alone have no time limit on it. If they did, the system programmers should be fired as grossly incompetent. The worst that might happen would that the page would freeze up for the person viewing the page. Chuck Entz (talk) 19:15, 16 March 2013 (UTC)[reply]

Interlanguage links

I have a question unrelated to Wiktionary and hope someone can point me in the right direction.

For a small wiki I sometimes contribute to, I want to introduce other-language versions. The wiki is small, though, so we don't want the overhead of multiple wikis. I'm trying to come up with a solution for the wikimaster, but I don't understand the configuration aspects very much.

My idea is to have the language links at the left link to a subdirectory. For example, if you are on "thisPage.html" and click "Spanish" in the language list at left, it would go to my.wiki.org/es/questaPagina.html.

I've found articles like mw:Manual:$wgInterwikiMagic, but nothing that addresses something exactly like this. Any suggestions welcome.

--BB12 (talk) 20:44, 16 March 2013 (UTC)[reply]

I don't get it, anyone? Mglovesfun (talk) 21:12, 16 March 2013 (UTC)[reply]

I'm happy to explain it differently. What don't you get? --BB12 (talk) 21:13, 16 March 2013 (UTC)[reply]

How about this: What's the easiest way to have a multilingual wiki in a case where the URL is wiki.myweb.org (so I can't have es.myweb.org, etc.)? --BB12 (talk) 22:16, 16 March 2013 (UTC)[reply]

(e/c) On this wiki, a page with the absolute URL http://en.wiktionary.org/wiki/this might contain the interwiki link [[fr:this]], which is a link to http://fr.wiktionary.org/wiki/this. If I understand correctly, BB wants it to be a link like http://en.wiktionary.org/wiki/fr/this instead (but on his wiki, not on Wiktionary). - -sche (discuss) 22:18, 16 March 2013 (UTC)[reply]

Yes, that seems, to me, to be the easiest way to make a wiki multilingual. I would think this is a really simple tweak in the settings, but I haven't gotten anywhere with the wikimaster, so I was wondering if someone here could point me where to go or suggest what should be done. --BB12 (talk) 00:49, 17 March 2013 (UTC)[reply]

I think that such a thing could be done by creative use of the interwiki-map (e.g., mapping es to //my.wiki.org/es/$1.html), but it seems messy and potentially fragile. For example, I could easily imagine getting everything working so that [[thisPage]] links just fine to its Spanish counterpart, but then having no way for that Spanish counterpart to link back to the English.

Instead, I'd suggest that you do something similar to how en.wikt produces sidebar links to Wikipedia when you use e.g. {{projectlink|pedia}}. The way that works is, the template produces wikitext like [[w:...|Wikipedia]], which results in HTML like <a href="//en.wikipedia.org/wiki/..." class="extiw" title="w:...">Wikipedia</a>. We then use CSS to prevent that link from being displayed normally, and we use JS to move it into the sidebar. In your case, you'd presumably add interwiki-links via a template like {{interwikis|es=questaPagina|fr=cettePage}} or whatnot.

You'd probably also want to use mod_rewrite to implicitly add uselang=es to Spanish pages, so that the whole interface is in Spanish, rather than just the content.

—Ruakh_TALK 02:13, 17 March 2013 (UTC)[reply]

Thank you for the suggestion. I have passed that on to the wikimaster! --BB12 (talk) 17:44, 17 March 2013 (UTC)[reply]

Some Latin templates now only ever require one parameter -- How about making it all of them?

{{l/la}} and {{la-decl-1st}} can now be passed a single parameter with macrons and the templates will automatically generate the macronless version of the word.

e.g. While you can still generate an inflection table with:

{{la-decl-1st|stell|stēll}}

now you can instead simply use:

{{la-decl-1st|stēll}}

The magic happens in Module:Latin, written in Lua. I'd recommend also using the same logic in {{l|la|...}} and making the requirement for two versions of Latin words a thing of the past.

Hopefully I haven't broken anything. Pengo (talk) 14:59, 17 March 2013 (UTC)[reply]

I have changed the name to Module:la-utilities, and I changed {{l/la}} to reflect that. I think the next obvious step with Latin templates is to merge {{la-decl-2nd}} and {{la-decl-2nd-N}}, and {{la-decl-2nd-ER}} (they should all eventually be a redirect to the first one), because we could just add a function to the Module:la-utilities that outputs the last two characters of a string; for example, if it's um it takes the neuter declension and if it's us or er it takes the masculine declension. —Μετάknowledge^{discuss/deeds} 16:05, 17 March 2013 (UTC)[reply]

Just to make it clear, {{l/la}} and its relatives were created before Lua came around, and were intended to be faster than {{l}}. However, now that Lua is here, they may well be redundant because {{l}} would presumably be quite a bit faster when Lua-cised. So it's better not to change or use those specialised link templates at all until we know for sure whether they are still needed. —CodeCa t 17:52, 17 March 2013 (UTC)[reply]

Well, in the mean time I think Pengo killed two birds with one stone by improving {{l/la}} and providing a way for us to edit one template and change which module is invoked in all the other templates that need macron-stripping (eventually, all of them). If you ever want to finish figuring out the best/fastest way to {{l}}-ify, with subpages or not, then we can make a copy of {{l/la}} and replace all uses of it in the template namespace with the copy. But it doesn't look like it'll be worked out anytime soon, so IMO there's no point preserving it as is. —Μετάknowledge^{discuss/deeds} 18:05, 17 March 2013 (UTC)[reply]

What I'm worried about is backwards compatibility. If we extend {{l/la}} with this extra functionality, it will no longer be possible to replace it with {{l|la}} as easily, if and when the time comes. I strongly recommend that for the time being, the specialised templates should not have extra abilities that the general {{l}} does not also have. —CodeCa t 18:16, 17 March 2013 (UTC)[reply]

But when the time comes, {{l}} should have lang-specific functions like this. Where else would we put this kind of template? —Μετάknowledge^{discuss/deeds} 18:20, 17 March 2013 (UTC)[reply]

I mostly agree with CodeCat. Language-specific functionality belongs in language-specific templates; in this case, I suppose that would be {{la-l}} or {{la-onym}}. {{l/la}} is intended to be a hackish variant of {{l|la}}, part of a family of templates with identical behavior, and it should conform to the requirements of that family. —Ruakh_TALK 18:48, 17 March 2013 (UTC)[reply]

I agree with Metaknowledge on that point though. If {{l}} can be made to automatically strip diacritics, why not? It can probably be made to work the same as automatic transliteration (in effect, it's the same thing). —CodeCa t 19:15, 17 March 2013 (UTC)[reply]

Because all editors use {{l}}. Editors who don't usually work on Latin understand that they need to look at the documentation for (say) {{la-noun}} before using it, and that they can't just assume that it works the same way as {{en-noun}} or {{fr-noun}}; but they should be able to expect that {{l}} works the same way they're used to.
Also, there are a whole bunch of problems with that Lua module. Each of those problems could, in principle, be fixed, but I think it's reasonable to expect that clever language-specific code will always have little problems and inconsistencies, for two reasons: (1) none of us is perfect (our cleverness is in finite supply); and (2) such code almost always does, and should, optimize for the 99% case, such that it's sometimes inapplicable to rare edge cases (e.g., Latin entries that really should have macrons for whatever reason). Do we really want all of those problems to be in {{l}}? Currently, when the language-specific code is in a language-specific template, we can always fall back on using a generic template that imposes fewer requirements (e.g. using {{head}} for pluralia tantum because of a language-specific noun-headword template that "knows" that the noun lemma is a singular form); but if it's the generic template itself that has the problematic language-specific code, we're SOL.
—Ruakh_TALK 20:00, 17 March 2013 (UTC)[reply]

But there are no rare cases. AFAICT, it's 100%, not 99%. In the end, I don't really mind what you do, as long as you don't break stuff. For example, don't edit {{l/la}} without editing {{la-decl-1st}}. I would replace it, but it looks like Module talk:la-utilities/tests is currently failing, so I'm going to revert the changes to {{la-decl-1st}} for now. —Μετάknowledge^{discuss/deeds} 20:30, 17 March 2013 (UTC)[reply]

Re: {{la-decl-1st}}: Thanks. Re: there being no rare cases: I think there are always rare edge cases, or at least, that we always want to leave the door open to rare edge cases. Maybe people who send SMSes in Latin treat ō_ō and o_o as two distinct emoticons? Maybe we get a Perseus dump of 10,000 entries with macrons in their titles, and want (temporarily) to be able to link to those entries (instead of having them be enforcedly orphaned until they're all properly fixed and merged)? Maybe we'll want {{l|la||bār}} to work? I have no idea. It just seems rather extreme to impose macronlessness as a technical restriction in 100.000% of cases. —Ruakh_TALK 21:05, 17 March 2013 (UTC)[reply]

Things can be used in ways we couldn't have foreseen, and interact in ways we would never expect, so that we may need an out for reasons unconnected to the unreal and relatively tidy universe of Latin morphological rules. I firmly believe that having an override should always be the default, and that it should be removed only where experience shows it's unnecessary, and where there are compelling reasons such as performance or usability. It just seems a good idea on principle not to design things around our alleged omniscience and infallibility. Chuck Entz (talk) 22:16, 17 March 2013 (UTC)[reply]

If you insist. What really matters to me right now is that, judging by Module talk:la-utilities/tests, the module isn't working correctly yet. (PS: When I text in Latin, I never use macra. If I really need to distinguish, I use an underscore following the letter. But that's just a bit of trivia I thought I'd share.) —Μετάknowledge^{discuss/deeds} 23:25, 17 March 2013 (UTC)[reply]

It was working when I saved it. There's been many improvements made in this short time, but also someone broke it while trying to fix something I did that probably breaks conventions. As it says at the top of Module:la-utilities, to test while editing, "Preview page with this template" with: Module_talk:la-utilities/tests . I've fixed it for now, but probably needs some work to be correct. Pengo (talk) 00:19, 18 March 2013 (UTC)[reply]

The edge cases aren't really an issue as it is: if you use two parameters it uses the old behaviour. I've been as conservative as possible with the code, so if two parameters are given, they're still both used and no macron stripping occurs (I did this originally in anticipation of performance concerns). It means for New Latin emoticons, you can still use {{l/la|ō_ō|ō_ō}}, which is a syntax that could be guessed or worked out by any user of the template in this unlikely situation. I didn't document it explicitly because I didn't think it would ever be necessary, and I'd except them to simply use [[ō_ō]], but I'll add it to the test cases. Note, I didn't make {{la-decl-1st}} as conservative (for simplicity's sake), but it could easily be made so. Pengo (talk) 00:03, 18 March 2013 (UTC)[reply]

Maybe one of the parameters could be set to - to suppress the automatic stripping. Which of the two would be more intuitive, I don't know. —CodeCa t 00:20, 18 March 2013 (UTC)[reply]

That would be easy enough to do, but I don't think it's at all necessary, unless it's to fit in with behaviour of other {{l}} languages. And I really don't see the controversy. It's hardly surprising behaviour that a link to the Latin ācer should link to the actual entry, acer#Latin, and not to the non-existent page, ācer#Latin, as it currently does. All other existing behaviour stays the same -- {{l/la|zebra}} still links to zebra#Latin, and {{l/la|elegans|ēlegāns}} still does what it did too, and if you really want to override the macron stripping behaviour you just use two arguments, e.g. {{l/la|ō_ō|ō_ō}} although I'm yet to see a real-world example of where this would be necessary. By the way, {{l/la}} is only transcluded by a handful of pages (11 all up, while it's not being used by {{la-decl-1st}}), so the rush to protect it seems a little unwarranted. Pengo (talk) 00:37, 19 March 2013 (UTC)[reply]

At the time that I protected it, it was widely transcluded; I had no way of telling that almost all the transclusions were via {{la-decl-1st}}. Thanks for the note; I'll correct that. (BTW, regarding your earlier comment that "someone broke [the module] while trying to fix something I did that probably breaks conventions" — nope, it was just a stupid mistake on my part. Some of the unit-tests were already broken even before my changes, so when my changes broke a few more, I didn't catch on at first that I'd messed up. Sorry about that.) —Ruakh_TALK 05:35, 19 March 2013 (UTC)[reply]

Fair enough, no worries. I think half of what I thought was broken code was from some other templates/pages being reverted. Anyway, any idea how to get those last two tests to pass? Would be nice if it would accept html entities, though not sure if it's needed. Pengo (talk) 11:08, 19 March 2013 (UTC)[reply]

Language table in Lua

With Lua it seems that it would be better (and easier) to have all language information (mainly code=names mapping) in a single page/module. This is what is being worked on in Module:languages here, and I'm working on a similar thing on fr.wikt with fr:Module:langues (the actual data table is in fr:Module:langues/data).

It looks like it may be a much better way to handle languages, instead of creating several templates for every language like currently (i.e. thousands of templates in the end).

However, someone on fr.wikt asked a question about performance. Although for a given page using such module may be more efficient, what would happen if someone changes the data table, just to add a single language ? How would this impact all the pages that use this module (in this case, potentially all articles) ? I asked this question at mw:Talk:Lua scripting#Lua changes and Job queue and I believe you may be interested to have this answered as well. Dakdada (talk) 15:18, 17 March 2013 (UTC)[reply]

Do we know about #mw.language.fetchLanguageName? In the lua debug console:

=mw.language.fetchLanguageName("ar")

العربية

=mw.language.fetchLanguageName("ar", "en")

Arabic

=mw.language.fetchLanguageName("ar-Arab")

Uses ISO 639 language codes, of course. —Michael Z. 2013-03-21 17:01 z

That's what I used at first when my initial table (on fr) was incomplete. There are two major issues with this :

Some languages are missing, some codes are not standard (e.g. als) and the name may differ from the ones on Wiktionaries.
It is slow when there are several names to retrieve. Loading the table in Module:languages is way more efficient (easily more than 10 times faster).

So in the last version of the module in fr, we completed the table with our 4500 current language codes and I ditched this function (although it can still be used in a secondary module). But I'm still concerned with the job queue impact, so for now we can't use this module (but several other modules are being tested with it). Dakdada (talk) 18:55, 21 March 2013 (UTC)[reply]

Ouch. Might be worth revisiting some time. I presume (ha ha) that a native function might get optimized to perform better than anything we could write in a scripting language. Also, it might be configurable to use Wiktionary codes or names.[3] —Michael Z. 2013-03-21 19:48 z

Obviously the function was not made to be queried hundreds of times in a row. If this issue is solved then we may consider switching. Dakdada (talk) 20:51, 21 March 2013 (UTC)[reply]

How to move all of Category:Templates with /doc subpage?

Following WT:RFM#Documentation subpages to /documentation, I've added this category to all templates that still use the "old" name. Modules already use /documentation exclusively. How can these be moved automatically? I don't think bots can do moves, can they? Also, the tab at the top of the page should be changed as well (and if possible, one should be added to Modules too). —CodeCa t 17:50, 17 March 2013 (UTC)[reply]

Re: "I don't think bots can do moves, can they?": Sure they can; search /w/api.php for action=move, or check out e.g. mw:Manual:Pywikipediabot/movepages.py. But I don't know if there's any page-move analogue to the concept of a "bot edit", so it may flood recent-changes unless done very slowly. —Ruakh_TALK 17:57, 17 March 2013 (UTC)[reply]

I just realised that regular accounts can't move pages without leaving a redirect. So whichever bot is used for this, it would need administrator rights... —CodeCa t 22:00, 18 March 2013 (UTC)[reply]

Latin first declensions in a single template

I've mashed all of Latin's first declension templates into one: {{la-decl-first}}. See the documentation for how it works and examples. It largely replaces eight similar templates, which is possible because Lua can look at what the last few characters of a parameter are. For example:

{{la-decl-first|candēla}} creates the same declension table as {{la-decl-1st|candel|candēl}}
{{la-decl-first|galaxiās}} = {{la-decl-1st-Greek-Ma|galaxi}}
{{la-decl-first|deābus}} = {{la-decl-1st-abus|de}}
etc

I can't see any problems with using it as is, but some might want to wait for the dust to settle, or perhaps until second and third declension templates are done too, when we can be more certain they'll and have a consistent format and parameters, or perhaps a super-declension-template is made that encompasses them all.

The guts of the code is in Module:la-utilities. I've tried to keep presentation code separate from other code, and also tried to leave it flexible enough to accommodate the addition of future declension tables relatively easily, or other uses. It largely still uses an existing empty-table template for presentation, but someone might feel like making it build the tables from scratch internally.

I'm far from a native Lua or Latin speaker, so please let me know if there's any errors or issues or corner cases I may have missed. See the template's documentation for more information. Pengo (talk) 09:54, 19 March 2013 (UTC)[reply]

Simplification of romaji entries

Like Mandarin pinyin at some stage, Japanese rōmaji entries need to be converted to soft redirects to hiragana and katakana entries (not direct to kanji as hiragana serves as disambiguation for multiple Japanese homophones. This is the outcome of the discussion we had on Wiktionary:Beer_parlour/2013/February#Stripping_extra_info_from_Japanese_romaji.

I wonder if it's doable via a bot. There are too many entries in Category:Japanese romaji, which have PoS headers and don't use {{ja-romaji}} template. Generating new ones is perhaps straightforward but not conversion.

This is how the romaji entries will look, (the only category they belong to is Category:Japanese romaji). Copying from Wiktionary:About_Japanese#Romaji_entries:

A hiragana only example: "tsuku"

==Japanese==

===Romanization===
{{ja-romaji|hira=つく}}

A katakana only example: "rūto"

==Japanese==

===Romanization===
{{ja-romaji|kata=ルート}}

A hiragana and katakana example: "ringo"

==Japanese==

===Romanization===
{{ja-romaji|hira=りんご|kata=リンゴ}}

--Anatoli ^{(обсудить}/^вклад) 04:46, 20 March 2013 (UTC)[reply]

For comparison, Japanese rōmaji will work similarly to Category:Mandarin pinyin. The debate about the Japanese rōmaji was resolved without a vote (see Wiktionary:Votes/2011-07/Pinyin entries for the vote on Mandarin pinyin). The vote actually prescribed NOT to add any definitions but some, especially old monosyllabic have definitions. With Japanese rōmaji we decided, not to have any definitions at all, only soft redirects. --Anatoli ^{(обсудить}/^вклад) 04:53, 20 March 2013 (UTC)[reply]

Is there potentially information in romaji entries that would be lost if a bot went through and deleted everything? DTLHS (talk) 05:33, 20 March 2013 (UTC)[reply]

In theory, no, as all the information on romaji entries is essentially duplicated in the corresponding kana entries. This was a large part of the decision to simplify, since romaji entries have basically just been disambiguation pages created as dupes of the kana pages to aid users who don't yet read kana.

In practice, there may be cases where the romaji entry was developed but the kana entry has not been. Provided the romaji entry information is good, I think that wikicode can just be copy-pasted to the corresponding kana entry, and Bob's your uncle. -- Eiríkr Útlendi │ Tala við mig 05:41, 20 March 2013 (UTC)[reply]

On the pinyin vote we also had a rule not to add any pinyin entry if hanzi didn't exist. This rules is followed. There are some entries in Category:Mandarin pinyin entries without Hanzi with both blue and red links but no "just red". It's a good idea not to create rōmaji before real Japanese entry exists. I don't know if this rule should be enforced but what's the point of a redirect to nothing or spend time adding all definitions and other info to a transliteration entry. The converted entries can be viewed in the history, if anything valuable is lost. Fine by me. Let's encourage work on real Japanese and save time. --Anatoli ^{(обсудить}/^вклад) 05:50, 20 March 2013 (UTC)[reply]

While cleaning up some categories (suffixes, counters

Done), found wa-ga without kana (わが) but kanji exists (我が). Will convert/create this one but no need to worry if some are lost. Pity the creator didn't bother to create a hiragana entry. --Anatoli ^{(обсудить}/^вклад) 05:55, 20 March 2013 (UTC)[reply]

wa-ga should be waga anyway... :) -- Eiríkr Útlendi │ Tala við mig 06:08, 20 March 2013 (UTC)[reply]

Sorry, whatever you proposed doesn't work. Entries must have definitions, otherwise AutoFormat will go and tag them as having no definition. -- Liliana • 16:08, 20 March 2013 (UTC)[reply]

Really? What about thousands of Category:Mandarin pinyin entries? To avoid your KassadBot picking them up # See ... on a new line is used.

@Eirikr. I made waga as well. --Anatoli ^{(обсудить}/^вклад) 20:12, 20 March 2013 (UTC)[reply]

I agree with Liliana that each Romaji entry should have a line starting with "#" in the wiki code, which is currently not the case at tsuku. Unlike tsuku, Pinyin biǎomiàn does have a line starting with "#": # {{pinyin reading of|表面}} surface. With Romaji, you should better follow the model of Pinyin as closely as possible rather than introduing a different format that uses "See also". Moreover, this dramatic change of treatment of Romaji should go through a vote. I oppose making this dramatic change without a vote. --Dan Polansky (talk) 22:23, 20 March 2013 (UTC)[reply]

The new line and # at the beginning is generated by the template. Mandarin, Gothic romanisation entries follow exactly the same patterns - they are soft redirects. The topic has been in the Beer Parlour for a long time with {{look}} to attract input and the most active Japanese editors - User:Haplology and User:Eirikr responded positively and are already using. The rationale was explained but I repeat briefly

The structure of using Romaji as an index (soft redirect) follows the structure of Japanese dictionaries. Users use "tsuku" to get to "つく". There is no duplication of information.
All the information in the rōmaji entries is contained in hiragana and katakana entries, only one click away.
Roman script is not the correct script for the Japanese language, it's only romanisation. No need to mislead users that romaji is a replacement for the Japanese writing system.
Currently, Japanese romanisation is the only exception (to my knowledge) from other languages. All languages have entries in their native scripts only, if they are not used in other scripts - i.e. Russian is only in Cyrillic, Arabic - only in Arabic. Romanisation entries are helpful only to find entries in their proper form, they are not nouns, verbs, they are romanisation.
Maintenance hell, mismatch between entries, missing Japanese entries when romanisation entries exist.

Dan, if you wish, set up a vote but since Japanese editors agreed to this method, I don't see a reason. when you opposed the vote on Mandarin pinyin you used Japanese romaji as a reason to vote against it, what's your reason this time? You're not going to maintain Japanese romaji entries, are you? --Anatoli ^{(обсудить}/^вклад) 23:03, 20 March 2013 (UTC)[reply]

Re Mandarin pinyin entries, the vote on pinyin explicitly disallowed any definitions (i.e. English translations in the entries), only links to hanzi (Chinese characters) - "a pinyin entry have only the modicum of information needed to allow readers to get to a traditional-characters or simplified-characters entry". (I was neutral on this rule). See "yánlì", which was used for the vote. This rule wasn't strictly followed in some cases but if it's causing confusion, the we might need to remove all English translation from Mandarin romanisation entries. Anyway, removing definitions was suggested by Eirikr, supported by Haplology and I agreed.

"yánlì" entry from pinyin vote:

==Mandarin==

===Romanization===
{{cmn-pinyin}}

# {{pinyin reading of|trad=嚴厲|simp=严厉|lang=cmn}}
# {{pinyin reading of|trad=妍麗|simp=妍丽|lang=cmn}}
# {{pinyin reading of|trad=沿例|simp=沿例|lang=cmn}}
# {{pinyin reading of|trad=岩櫟|simp=岩栎|lang=cmn}}
# {{pinyin reading of|trad=沿歷|simp=沿历|lang=cmn}}

--Anatoli ^{(обсудить}/^вклад) 23:10, 20 March 2013 (UTC)[reply]

I am saying that if you plan to do sweeping content-removing changes in all Romaji entries, as you do, you should provide evidence of consensus of all interested editors rather than just those active on Japanese entries, in the form of a vote. In this discussion, a particular proposal on formatting has met opposition form an editor who does not edit Japanese for the most. As I do not know what your proposal entails exactly, I cannot create the vote for you. If you plan to forbid definitions from Romaji entries, that should very clearly be stated in the vote, not the implicit way it was done in the Pinyin vote; the actual practice in Mandarin romanization does not actually remove all definitions, as you have pointed out. Lack of express opposition in Beer parlour is not good enough evidence of consensual support of drastic sweeping changes. Letting the topic sit in Beer parlour for a long time is just a waste of time. {{look}} hardly ever attracts any input, as you should know by now, being an experienced Wiktionary editor; the template could as well be deleted as far as I am concerned. Furthermore, I am far from sure you are correct in your estimate of what has and has not reached support of the most active editors of Japanese entries, as I have found the following statement made by Eiríkr Útlendi: "I must therefore strongly oppose any move to strip romaji and / or kana entries of POS and gloss information." --Dan Polansky (talk) 10:30, 23 March 2013 (UTC)[reply]

If you had actually read through the discussion instead of merely looking for ammunition, you would know that Eiríkr was reacting to his original perception of an earlier form of the proposal. After clarifications, discussions and further development, he came to be a proponent of the resulting concept. I'm not saying anything about your main point, just your misuse of Eiríkr's comment. Chuck Entz (talk) 14:48, 23 March 2013 (UTC)[reply]

Sure, I should carefully read through the whole discussion to find out what is actually being proposed, and who supported what at what points of time, and what changes of opinion occurred, while you cannot be bothered to write up a clear proposal and produce evidence of its being supported. --Dan Polansky (talk) 00:18, 24 March 2013 (UTC)[reply]

Here again, a quick scan through the posts would have shown that it was my only post on this topic. It's not my proposal, so I have no obligation to write it up. My only connection to it consists of having read everything as it was posted, and being being mildly annoyed at your jumping in and assuming things without checking. Chuck Entz (talk) 01:12, 24 March 2013 (UTC)[reply]

I admit that I could have been reading more carefully, skimming more slowly, and being more attentive overall. Nonetheless, I still think that I should not have to wade through a fairly long discussion to see whether there is or there is not a consensus, on what the consensus is, and how many people have been involved in the discussion. --Dan Polansky (talk) 01:58, 24 March 2013 (UTC)[reply]

Um, I usually skip to the bottom and read the last few paras to find out how things turned out. I know my wife does this with novels. Had you done so, you would have seen my comment:

@Anatoli, the new {{ja-romaji}} looks great from an editor and user usability standpoint. Barring any concerns voiced by other editors, I think this thread has reached a successful conclusion.

Even without reading anything else, "this thread has reached a successful conclusion" might be a hint that consensus, or at least broad agreement, had been found... -- Eiríkr Útlendi │ Tala við mig 04:40, 24 March 2013 (UTC)[reply]

Module:IPA

I made a very basic IPA > X-SAMPA transliterator at Module:IPA. Needs work.

Also relevant: Wiktionary:Beer_parlour/2013/January#(X)SAMPA —Michael Z. 2013-03-20 21:05 z

I didn't know you could write table keys in that way... —CodeCa t 21:37, 20 March 2013 (UTC)[reply]

CodeCat : it's right here mw:Extension:Scribunto/Lua_reference_manual#table.

By the way, is the usefulness of X-SAMPA accepted here on en.wikt ? On fr.wikt we chose to move everything in a gadget (even then I don't think anyone uses it).

But if you need a list, check out the gadget list here : fr:MediaWiki:Gadget-APIversXSAMPA.js (not sure if it is complete though). Dakdada (talk) 21:43, 20 March 2013 (UTC)[reply]

I think Lua is preferred to a gadget, though, because it runs on the server. —CodeCa t 22:03, 20 March 2013 (UTC)[reply]

Putting X-SAMPA in a Lua module would have a cost, as it would be loaded in every page with IPA. I'm not sure it is worth it, given very few people actually use it (if any). Gadgets are a good way to give users the API to X-SAMPA conversion, since only the people who want to use it would load the gadget from the site. Although that way we assume that the people who absolutely want to read ASCII pronunciations have javascript enabled... Dakdada (talk) 23:16, 20 March 2013 (UTC)[reply]

Has anyone figured out how to import a table with mw.loadData? This would let the server load the transliteration table once only, in read-only mode, even if there were many instances of IPA on a page. I couldn’t get it to load a table with Unicode data.

I did use it but I have to admit that I did not compare it to a simple require to see if the data was really cached. Dakdada (talk) 10:27, 21 March 2013 (UTC)[reply]

Where can I see your code? —Michael Z. 2013-03-21 15:22 z

The module is here (sorry it's in French): fr:Module:langues, with the table in fr:Module:langues/data. As an example, the page fr:Utilisateur:Darkdadaah/eau/Pamputt can be created within 0.5s with mw.loadData. When I replace it by require, the page is built in 8 seconds, with twice as much memory used. Dakdada (talk) 18:45, 21 March 2013 (UTC)[reply]

X-SAMPA could be incorporated into {{IPA}}. I’d like to see a gadget that shows only IPA by default, and lets the reader toggle IPA/X-SAMPA display, or copy X-SAMPA. Less clutter on the page for the 99.999% of us who have no use for X-SAMPA. —Michael Z. 2013-03-21 01:00 z

It would be easier if both IPA and X-SAMPA were created with a single template. Right now it's something like {{IPA|}}, {{X-SAMPA|}} so just hiding one or the other would leave an ugly comma. Dakdada (talk) 10:27, 21 March 2013 (UTC)[reply]

Yes, exactly. If X-SAMPA can be reliably derived from IPA, then it can be there every time, and no need for a separate template. But seeing as we know of zero users of X-SAMPA, there’s no need to show it to everyone at all. Any ideas for an unobtrusive interface? —Michael Z. 2013-03-21 15:22 z

I suggest to create a JavaScript tool based on Module:IPA (which is pretty easy), and remove Template:X-SAMPA, then anyone who wants to see X-SAMPA pronunciation in entries may activate the JS code. --Z 02:22, 1 April 2013 (UTC)[reply]

Error when moving a page?

I'm trying to move avantpaísos to avantpaïsos without leaving a redirect. But when I try, I get an error like this: [6560d38b] 2013-03-20 21:31:29: Fatal exception of type MWException. Is anyone else able to do the move? —CodeCa t 21:32, 20 March 2013 (UTC)[reply]

Apparently I'm getting the same error with other pages I try to move. —CodeCa t 21:35, 20 March 2013 (UTC)[reply]

Not me; I've tried and failed. Mglovesfun (talk) 21:53, 20 March 2013 (UTC)[reply]

Same here. SemperBlotto (talk) 22:07, 20 March 2013 (UTC)[reply]

avantpaís is displaying a script error. This needs fixing urgently. Mglovesfun (talk) 22:43, 20 March 2013 (UTC)[reply]

Bot to add `{{head|en}}` to Category:English plurals

Bot to do this:

==English==

===Noun===
'''crossings'''

# {{plural of|crossing}}

to

==English==

===Noun===
{{head|en}}

# {{plural of|crossing}}

The regex is pretty simple. I can do it using the regex function on AWB but AWB also cuts of at 25,000 for categories so I could only go as far as that. Perhaps MewBot (talk • contribs) would like to take this one? Nevertheless, it could be done for other languages and also for verb forms, adjective forms and so on. Mglovesfun (talk) 11:08, 22 March 2013 (UTC)[reply]

I would prefer another approach, which I was just about to suggest when I saw this. It's my preference that form-of templates like {{plural of}} don't add part-of-speech categories to the entries. It makes sense to me because we already have, as a rule, headword-line templates that add PoS categories, so this makes it more consistent. But there are other reasons as well. In many cases, the form-of templates end up being added to other kinds of entries and other languages, but in those cases it may not be appropriate to have a category. With {{plural of}} this is particularly noticeable because the category it places words in, Category:English plurals, isn't very clearly named because it doesn't say plurals of what. In a language like Catalan, such a name would not be appropriate, because Catalan also has plural adjectives and plural verbs. Yesterday I cleaned out Category:Catalan plurals, which (not surprisingly) contained several adjective plural forms as well. Some templates, including this one, allow you to suppress the category or change its name, but that seems like putting the cart before the horse. Catalan already has a {{ca-noun-form}} template which places the entry in the most appropriate category, so why would we need to add nocat=1 every time we use {{plural of}} for Catalan? That seems backwards. Therefore, I propose this replacement instead:

==English==

===Noun===
{{en-noun-form}} [or {{en-noun-plural}}]

# {{plural of|crossing|lang=en|nocat=1}}

The headword-line template, which we would need to create, would add the plural category instead. So nocat=1 is added to suppress the category of {{plural of}}, which in turn would make it easier for us to find out how many entries still rely on its categorisation. It is my hope that once all instances of {{plural of}} have this parameter, we can remove the categorisation code from the template safely. —CodeCa t 13:51, 22 March 2013 (UTC)[reply]

What is the advantage of doing either of these over the current situation of a plain wikitext inflection line and categorization by {{plural of}} (presumably eventually to be replaced by {{en-plural of}})? Uniformity? That seems like a positive hazard as it seems to lead folks to believe that they know how to make changes to English entries when the evidence leads me to believe they don't.

There are quite a few cases where the inflection line is for a lemma and {{plural of}} does categorization at the sense line level. DCDuring TALK 15:39, 22 March 2013 (UTC)[reply]

Why would you want to do this? Why replace simple code with a template that does nothing? Are you trying to make the wiki run even slower? SemperBlotto (talk) 15:41, 22 March 2013 (UTC)[reply]

I can't see any advantages. Intention redundancy? CodeCat you're normally the first to want to get rid of redundancy (even before me). Mglovesfun (talk) 16:37, 22 March 2013 (UTC)[reply]

Redundancy isn't really an issue here, it's about what is workable. If thousands and thousands of uses of a template need a nocat=1 parameter just to stop it from doing something, then that seems like bad design. And when people come across something that is badly designed, they're going to try to work around it, which may make things worse. For example, I've seen lots and lots of entries that have tried to avoid the categorisation of {{plural of}} by instead using {{form of|plural}}. While others, like I mentioned, ignored the category with the result that at least for Catalan entries, Category:Catalan noun forms and Category:Catalan plurals contained almost the exact same entries. The only difference between them were either a few entries that lacked {{ca-noun-form}}, or entries that used {{plural of}} for adjectives (which is totally intuitive; it's the category that's wrong!). That can't be a good thing. My proposal helps to make things consistent by sticking to a simple rule that most non-form entries already adhere to: the headword-line template is responsible for the PoS category. {{head}} already works that way, as do the many language-specific templates like {{en-noun}}. I think that is a very simple rule, and if we can achieve a situation where our templates follow it, it will make things easier to understand because editors will know exactly which templates they can expect to add an entry to a category and which not, which avoids errors due to uncertainty. I mean, think about this yourself... would you rather have to remember for each template whether it categorises or not, or would you prefer learning a simple rule? —CodeCa t 17:28, 22 March 2013 (UTC)[reply]

I have never used "nocat=1" (it is not obvious to me what it is supposed to do), so I have never had to remember what to do. I have no idea which English, French, Italian, Latin or German templates allow such a keyword. SemperBlotto (talk) 17:33, 22 March 2013 (UTC)[reply]

That is exactly what I am talking about above. Consistency in how similar templates work is good. It means that once we learn to expect certain behaviour, we can extend that expectation to new templates with reasonably safe knowledge that it will do as we think. Consider another example, overriding the headword of a headword-line template. The majority of our templates use head= for that, so many of us (myself included) would just use head= without even thinking about it. We expect it to work. Similar for linking templates, which take a second parameter to change the displayed link text. Nobody thinks about it, everyone just expects it to work. And that is a good thing because it lessens the mental burden of remembering how all the templates. My proposal is intended to be just one step towards that. —CodeCa t 17:41, 22 March 2013 (UTC)[reply]

[Aside: here’s a working link to this section: Bot to add {{head|en}} to Category:English plurals. —Michael Z. 2013-03-22 18:43 z]

Why not modify {{en-noun}} so it works for plurals? Something like {{en-noun|pl}}. Every time I create an English plural entry, I spend five minutes previewing {{en-noun|-}}, {{en-noun|!}}, {{en-noun|?}}, read the docs again, and then give up and leave it for someone else to clean up. As an editor, I don’t care which template adds the category.

Why replace simple code with a template that does nothing? Code that is completely inconsistent with every other noun entry is not simple, it is obscure. —Michael Z. 2013-03-22 20:16 z

There would need to be a way to distinguish a plurale tantum/plural-only noun (a lemma that happens to be plural) from a plural form of a regular singular noun. We wouldn't want pants categorised as a noun plural form, I think? I think adding such functionality to {{en-noun}} is dangerous, because with misuse we could end up with plurals categorised in Category:English nouns. Having a separate template seems like a safer option, and it also fits with the general idea that each part of speech has its own template (for categorisation purposes, noun forms are their own part of speech, distinct from nouns). Of course, just writing {{head|en|plural}} or {{head|en|noun form}} is a possibility too. —CodeCa t 21:07, 22 March 2013 (UTC)[reply]

Please don't mess with {{en-noun}}. A few templaters have said that {{en-noun}} was one that was sufficiently complicated already so that they didn't want to add features. We have {{en-plural noun}} already as an inflection-line template. It is also inappropriate in the cases mentioned above in which the same headword is both a plural-only and a simple plural. {{head|en}} and hard categorization seem adequate for that case and other exceptional cases that arise.

I see some advantage to creating an English-specific direct sense-line replacement for {{plural of}}. If all other languages want to use an inflection-line approach or a language-specific sense-line approach, then by all means let there be language-specific and generic templates to do so. As the Little Red Book said, let a thousand flowers bloom!!! DCDuring TALK 22:27, 22 March 2013 (UTC)[reply]

Maybe this is an opportunity to try out a single template for headword and sense line(s), incorporating HTML dfn and dl. —Michael Z. 2013-03-22 23:04 z

Yes. It's probably long past time to kill off this idea of wiki-style participation here. I say let there be an apprenticeship period, no edits from non-whitelisted users without approval, etc, qualifying exams for would be template writers, HTML and CSS qualifying exams for adminship. DCDuring TALK 23:34, 22 March 2013 (UTC)[reply]

you don’t think one well-designed template could be made more accessible for editors than two vaguely unrelated templates? —Michael Z. 2013-03-23 01:10 z

Not for the relevant group of editors for a language or a related group of languages. There is apparently a typical, cognitively economical way of presenting an inflection line for a given language or language family or larger grouping, based on the characteristics of the language, its script, and the PoS in question. Ordinary wikitext should work for formatting almost all languages that use Latin script. The content portion of such templates seems to be a matter of language knowledge. The uniformitarian urge to template-standardize annoys me no end and seems completely contradictory to what a wiki is supposed to be. DCDuring TALK 01:49, 23 March 2013 (UTC)[reply]

What puzzles me is why you think using less templates is inherently better. What makes '''word''' or [[word#English|word]] better, in your opinion, than {{en-noun}} and {{l|en|word}}? In particular, what I am confused about is why you make such a sharp distinction between "ordinary wikitext" and "templates". To me, they're the same thing, one is part of the other, and they have to be learned as a single whole. —CodeCa t 02:01, 23 March 2013 (UTC)[reply]

Overhead. I disagree with him here, but I see his point: when all you want is the headword in italics, it's a touch Rube Goldbergish to have a bunch of templates calling other templates to end up with the same result. It's very easy (especially now, as we're marching into the brave new world of Lua modules) to become enamored of our template cleverness. Templates are merely a tool for producing the right output on the web page. They are often extremely handy, and do wonderful things, but they also have costs. Otherwise, why not have a single template called {{template}}? It would have 47 positional parameters, and 73,953 named parameters- half of which would be for turning other parameters on or off/and or feeding them ad-hoc faked inputs. Maybe we can talk Daniel Carrero into designing it for us Chuck Entz (talk) 03:19, 23 March 2013 (UTC)[reply]

Re: "Templates are merely a tool for producing the right output on the web page": That's not true. They're also a tool for marking up the wikitext itself with semantics usable by mirrors, bots, and other external tools. Re: "when all you want is the headword in [boldface], […] ": But is that all we want? Mzajac and CodeCat actually want the headword to have richer styling than that. (You talk about "the right output on the web page", but of course, the output consists of more than just what you see when you open your Web browser. It also consists of what your browser sees, what search-engines see, what screen-readers see, and so on. None of this requires templates — anything that can be put in a template could also be put directly in the entry's own wikitext — but templates are often simpler for everyone.) —Ruakh_TALK 17:21, 23 March 2013 (UTC)[reply]

By the way, this has long since gone from a Grease Pit discussion to a Beer Parlor one. Chuck Entz (talk) 03:22, 23 March 2013 (UTC)[reply]

That's true, but that's also how programs tend to be written these days, it's the whole idea behind encapsulation and object-oriented programming (specifically, the idea of "hiding" details that programmers should not concern themselves with). Writing raw wikitext everywhere seems to me like writing in raw assembly language; yes it works, yes it's fast, and it's clear what everything does right down to the finest details, but it's anything but practical, it offers no consistency and it's a big problem to maintain it. My goal is to use templates as a tool to both get the result we desire and to make things more intuitive and easy to use. To me, "always use a headword-line template" is more intuitive than "use a headword-line template for non-English, use bolded text with an explicit category for non-English, except when there's already a headword-line template... oh and you can use bolded text for non-English too but that's really wrong", and "the headword-line adds the PoS category" is more intuitive than "the headword-line sometimes adds the category, but not always, and sometimes the form-of template does, so be sure to check whether one of the templates adds the category you want". I also think "language is always required" is more intuitive than "language is required except usually for English" (the best evidence for that unintuitiveness is the fact that there are countless context labels that put foreign entries into English categories). —CodeCa t 15:08, 23 March 2013 (UTC)[reply]

@CodeCat: There are three types of potential bad consequences of using templates in cases where there is no clear functional benefit. First there are performance effects. Each template contributes to latency. Complex templates that call other templates to confirm that a language uses Latin script and are multiply transcluded on a page should be an embarrassment. Second, there are the template-editing consequences. A template that is transcluded more than a hundred thousand times (we have 72 of them) is not easily changed without bad consequences, especially as we don't have any good test case suite AFAICT. The most minimal change in one of million-transclusion templates (We have 7.) can prevent other template changes from going into effect for hours. The effect on contributors is worse. I expect that most users are simply intimidated by the complexity of our template system. It is hard enough just to use them correctly, let alone use all the features, let alone amend them. Instead of having best practices documented so any earnest contributors could create a template likely to be useful, we attempt to impose uniformity.

The relative lack of good documentation for our overall system makes it extremely likely that the loss of a few technically will bring this wiki down. DCDuring TALK 16:30, 23 March 2013 (UTC)[reply]

A lot of good points here. Also as yet unmentioned is the fact that we are making web pages that aspire to conform to the HTML standard of structured text representation. Wikitext, when used with disregard for the HTML it produces, is worse than MS Word. It makes this discussion-point you are reading a <dd> in 15 nested one-item <dl>’s. When we update our internals, we can take advantage of the opportunity to try to improve the horrific tag soup generated by this website. —Michael Z. 2013-03-23 17:20 z

Reminder of Lua help session in a few hours

Hi! This is a reminder: today at 1800 UTC, in about three hours, there's a Lua/Scribunto help session on IRC; please see the IRC office hours page on meta for details. Thanks! Sharihareswara (WMF) (talk) 15:05, 22 March 2013 (UTC)[reply]

Soft Keyboard, Please?

What happened to the soft keyboard? I'm missing it very much. ---- Lo Ximiendo (talk) 03:26, 23 March 2013 (UTC)[reply]

I have also noticed some weirdness. — Ungoliant ^(Falai) 03:52, 23 March 2013 (UTC)[reply]

What is a (deprecated template usage) soft keyboard? SemperBlotto (talk) 08:16, 23 March 2013 (UTC)[reply]
- I thought it was one of those keyboards made of a soft silicone that could be rolled up (like this one), but if Lo Ximiendo is missing hers, she would hardly be asking us about it. Perhaps she means a virtual keyboard? —An gr 10:02, 23 March 2013 (UTC)[reply]

“Software keyboard,” by elision. —Michael Z. 2013-03-23 17:22 z

I assume that she means that row of icons and text immediately above the edit window (does anyone ever use it?). I notice today that something flashes up temporarily just before it is displayed. SemperBlotto (talk) 17:27, 23 March 2013 (UTC)[reply]

I mean that I'm unable to type either Cyrillic or Arabic letters with the virtual keyboard that's provided to us (now in an unusable shambles, to me). ---- Lo Ximiendo (talk) 19:19, 23 March 2013 (UTC)[reply]

I think you mean in the Edit window, just above the window where it says, >Advanced, >Special characters, >Help. When you click on Special characters, you can select from varous keyboards, such as IPA, Arabic, or Cyrillic. For me, it seems to be working fine. I don’t know if the skin has anything to do with it, but I still use the Monobook skin. —Stephen ^(Talk) 21:50, 25 March 2013 (UTC)[reply]

Oddly enough, I can still type with the deformed virtual keyboard. It's just that I miss the old appearance of the edit window. (Can I make a screenshot of my situation?) --Lo Ximiendo (talk) 22:24, 25 March 2013 (UTC)[reply]

This works fine for me too on en.wiktionary.org, using Firefox 18.0.2 on Linux. Is the problem similar to this screenshot? If so it might be the same problem as bugzilla:46401/bugzilla:46575. If it's not, could you please tell us which browser software (and version) you use and on which operating system? If your browser has a "JavaScript Console" or "JavaScript Debug Window", is any output created in that console when loading the "Edit" page and also when you try to insert Arabic or Cyrillic letters? And yes, a screenshot would also be helpful in that case. :) Thanks! --AKlapper (WMF) (talk) 22:02, 26 March 2013 (UTC)[reply]

The problem I'm having (and it's apparently gone, maybe for now at least) is similar to the screenshot you showed (and I'm using version 11 or 12 of Firefox). Thanks for showing me a Bugzilla entry! :) --Lo Ximiendo (talk) 22:10, 26 March 2013 (UTC)[reply]

Printing pages with PoScatboiler

I am getting grossly non-wysiwyg printer results from printing Category:English phrasal verbs. The text and templates embedded in {{poscatboiler}} for that page prints the url for the edit link and for each letter in the index box. Thus there is a largely extraneous first page. "Printable version" creates the same appearance on the screen.

Similar problems appear on principal namespace pages that have urls in {{quote-book}}. I haven't checked further. DCDuring TALK 19:41, 23 March 2013 (UTC)[reply]

Do you think you could show an... um... "screenshot"? —CodeCa t 21:08, 23 March 2013 (UTC)[reply]

It seems to be {{categoryTOC}}:

File:categoryTOCprinted.pdf Chuck Entz (talk) 21:33, 23 March 2013 (UTC)[reply]

But also urls contained in {{quote-book}} and even just inside single square brackets in the same way. This is probably considered desirable behavior for {{quote-book}} and single square brackets, but it is not for the index box. DCDuring TALK 21:41, 23 March 2013 (UTC)[reply]

True, but we have to start with a specific example so there's something to look at. My guess is that there's code to hide the URLs that doesn't work when printing. Chuck Entz (talk) 21:55, 23 March 2013 (UTC)[reply]

You are probably right. At WP "printable version" displays urls that are in I a square bracket. BTW, if it only occurs for printing, it is unlikely to be urgent for anyone. DCDuring TALK 22:34, 23 March 2013 (UTC)[reply]

The MW print style sheet contains the following:

189 	#content a.external.text:after,
190 	#content a.external.autonumber:after {
191 	/* Expand URLs for printing */
192 	content: " (" attr(href) ") ";

Presumably this could be overridden by some CSS in the offending template (which I couldn't find using our documentation) or better in the offending class of templates. I assume it is necessary to have the full url to link to a specific portion of a category, but is it? DCDuring TALK 13:20, 25 March 2013 (UTC)[reply]

It looks like that table of contents is generated by {{en-categoryTOC}}, and perhaps other members of category:TOC templates. It makes those links with the fullurl parser function. No idea why that should give it the class .external.

I guess we can override that with something like #toc a.external.text:after { content:"" } in the right stylesheet. (Is this a general print stylesheet, or is is specific to Vector?). —Michael Z. 2013-03-25 16:48 z

I have the problem in Monobook. I stuck the line in my common.css and it did the trick, selecting what needed to be removed, but not the other urls. For my purposes, I rarely (never?) need the urls. I don't know who else prints this kind of thing. I only do it when I need a medium/small list as a checklist. DCDuring TALK 17:58, 25 March 2013 (UTC)[reply]

It doesn't seem to work with 2 non-Latin script category ToCs that I tried (one was Sanskrit), so it is not a general solution. But it suits me fine. DCDuring TALK 18:05, 25 March 2013 (UTC)[reply]

Which templates, DCD? We may as well fix this for everyone. —Michael Z. 2013-03-25 18:54 z

Okay, I see that {{categoryTOC-Devanagari}} is an example. This template is not built on postcatboiler, but is custom-built and inserted into a category page. There may well be dozens of templates with their own variation of code. We should standardize the HTML and id or class names for such T’s of C. —Michael Z. 2013-03-25 19:00 z

I see that these TOCs use class="plainlinks" to prevent the little “external link” arrows from showing up. This is probably a good indicator that a URL should not be printed either. This CSS might be more generally applicable: .plainlinks a.external:after { content:""; }. For good measure, it should probably be in an @media print { } block. Needs testing to see if the rule is specific enough to override the other. —Michael Z. 2013-03-25 20:28 z

This seems like a practical, if perhaps tedious, demonstration of the benefits of the use of CSS (and its allies) and of compliance. I rest your case. DCDuring TALK 21:38, 25 March 2013 (UTC)[reply]

[Nodding sagely.] —Michael Z. 2013-03-25 22:24 z

I think I may have done it, using MediaWiki:Print.css (I keep finding more style sheets). Please reload and confirm. —Michael Z. 2013-03-25 22:24 z

It seems to have suppressed the bad urls and not the ones we'd probably want, such as in Citations. But I should try again later to make sure that there isn't cache delay or something (though there shouldn't be as the Sanskrit was not helped by what was on my css page insertion). DCDuring TALK 22:47, 25 March 2013 (UTC)[reply]

Random entry (by language) broken

If you click on the Random entry button, the functionality is as usual. However, if you click on (by language), if you click on any of the languages listed on http://en.wiktionary.org/wiki/Wiktionary:Random_page , you get the following error:

403: User account expired

The page you requested is hosted by the Toolserver user hippietrail, whose account has expired. Toolserver user accounts are automatically expired if the user is inactive for over six months. To prevent stale pages remaining accessible, we automatically block requests to expired content.

If you think you are receiving this page in error, or you have a question, please contact the owner of this document: hippietrail [at] toolserver [dot] org. (Please do not contact Toolserver administrators about this problem, as we cannot fix it—only the Toolserver account owner may renew their account.)

HTTP server at toolserver.org - ts-admins [at] toolserver [dot] org

Please get this sorted out. Kingturtle (talk) 22:51, 23 March 2013 (UTC)[reply]

Comment about this and about other, related/similar things that are broken (as a result of toolserver accounts expiring): if no-one plans on fixing certain gadgets in the near future, the links to them should be removed. - -sche (discuss) 23:06, 23 March 2013 (UTC)[reply]

There is a plan to move the tools from the Toolserver to the Wikimedia Labs, where tools should be easier to maintain and manage (no user-defined time limit). We can't migrate the tools just yet, though. Dakdada (talk) 17:10, 25 March 2013 (UTC)[reply]

Can you put a note atop http://en.wiktionary.org/wiki/Wiktionary:Random_page explain that none of those links currently work, but a plan is in the works to get them working again? Kingturtle (talk) 18:47, 28 March 2013 (UTC)[reply]

Done - -sche (discuss) 19:20, 28 March 2013 (UTC)[reply]

Page linking to itself?

According to Special:WhatLinksHere/နီး, the page နီး links to itself, but I can't find any self-link there. What am I missing? —An gr 16:07, 24 March 2013 (UTC)[reply]

I did a null edit and it went away. Probably an old transclusion. —Michael Z. 2013-03-24 17:14 z

Thanks! —An gr 19:38, 24 March 2013 (UTC)[reply]

Gender tag popups

Since last night, the gender tags like m, f, n, pl have been showing up with little dotted underlines which tell you what they stand for when you mouse over them. Is there some preference I can set or something I can set in my CSS to turn it off? I find it very annoying. —An gr 11:24, 25 March 2013 (UTC)[reply]

Fixed. However, there's a lot of caching that goes on. To bypass the caching, you can try visiting //bits.wikimedia.org/en.wiktionary.org/load.php?debug=false&lang=en&modules=site&only=styles&skin=vector&* and performing a "hard" refresh (holding down the Shift key while you refresh) a few times. And even this is not necessarily guaranteed, since some of the caching is server-side rather than client-side. If it doesn't work for you, and this bothers you enough that you want the fix ASAP, you can copy that portion of CSS to your own personal Special:MyPage/common.css, since changes to that page are picked up immediately. —Ruakh_TALK 13:58, 25 March 2013 (UTC)[reply]

Well, the dotted underlines are gone already. The popups are still there, but they aren't so annoying. Thanks, Ruakh! —An gr 14:09, 25 March 2013 (UTC)[reply]

The popups have always been there. It's just not very obvious that they are there, so I actually thought the dotted line was kind of helpful, even if it didn't look very good. —CodeCa t 14:27, 25 March 2013 (UTC)[reply]

I would be content with a way to turn it off for me even if it's left on as a default if other people think it's useful. It's just that to me, the dotted underline says, "Here's something that requires your urgent attention", but the gender tags don't require it. I've been using foreign-language dictionaries for over 35 years now; I know what m, f, n, and pl stand for. —An gr 14:31, 25 March 2013 (UTC)[reply]

Oops; I made a change to MediaWiki:Vector.css. I forgot that Firefox adds underlines to abbr’s with titles (I use Safari, which does not). It’s one of the few cases where the major browsers have contrary visual rendering. —Michael Z. 2013-03-25 16:18 z

I don't understand this at all; your comment doesn't seem to explain anything. Actually, your comment would be a better explanation for not doing what you did. Maybe you've left out a few steps of your reasoning? (Plus, I don't think your comment is factually accurate, since I use Firefox exclusively, and I don't think I was seeing those dotted-underlines until you made this change. But I wouldn't swear to that.) BTW, if your goal was merely to remove underlines when there's no title, then it would make more sense to use abbr:not([title]), rather than hoping that your explicit abbr[title] rule is restoring a default behavior rather than creating a new one. (Or is that not possible for some reason? I admit, I haven't tested.) But anyway, I've reverted your change for now, since it's obvious that it doesn't have consensus. —Ruakh_TALK 01:14, 26 March 2013 (UTC)[reply]

The Vector skin needlessly adds underlines and the help cursor to all abbr elements lacking titles. I had corrected that in MediaWiki:Vector.css, but inadvertently caused the problem being discussed here and resolved by your edit in MediaWiki:Common.css. Now the underlines and help cursor are back again because you removed my edits. If you use a :not selector, your fix will not work in MSIE 8. —Michael Z. 2013-03-26 14:11 z [edited —MZ]

Nope, I still don't get it. As far as I can tell, "the problem being discussed here" is the presence of dotted underlines. Seeing as your change to MediaWiki:Vector.css involved a whole chunk of CSS whose sole purpose was to add those dotted underlines in certain cases, I don't see how that can have been inadvertent . . . —Ruakh_TALK 05:04, 27 March 2013 (UTC)[reply]

Japanese pitch accent template Template:ja-accent-common

I've recently gotten my hands on NHK Broadcasting Culture Research Institute, editor (1998), NHK日本語発音アクセント辞典 [NHK Japanese Pronunciation Accent Dictionary] (in Japanese), Tokyo: NHK Publishing, Inc., →ISBN.

I'd like to rework {{ja-accent-common}} a bit to change how the information is presented, moving the type of pitch to the front of the line and building in an option to use IPA, as already suggested (but not implemented) by the use of square brackets for the romaji. Would anyone object? I think I'm just about the only one using this template in recent edits. And should I put this thread in WT:BP instead? Note that I am proposing a change in presentation only. -- Eiríkr Útlendi │ Tala við mig 18:51, 25 March 2013 (UTC)[reply]

But wouldn’t square brackets look like phonetic IPA, if it appears in the same context as slashes for phonemic IPA? —Michael Z. 2013-03-25 22:29 z

Sorry, I was stuck in my own head and speaking in shorthand, as it were.

When provided with the correct params, this template currently outputs a string in square-bracket format, suggesting phonetic IPA, but without using the IPA font styling. In uses of it that I've seen entered by editors before me, folks have been using this to input romaji + IPA-style tone diacritics. I've followed suit so far, but this strikes me increasingly as incorrect.

My proposal includes reworking this part of the template to use {{IPAchar}} for proper font formatting, to link to the proper phonology page much as when using {{IPA|lang=ja}}, and to use tone letters on each mora for better visual clarity, since actual phonetic IPA already includes diacritics over the letters that would interfere with the tone diacritics. I'd also explain in the documentation that this is intended for IPA, not romaji. Lastly, I'd see about going back through existing entries and fixing the transcriptions.

If folks are interested, I'll knock up a demo of what I'm thinking. -- Eiríkr Útlendi │ Tala við mig 22:38, 25 March 2013 (UTC)[reply]

Maybe you've seen this, but here's the BP discussion about this from two years ago, which links to even more discussion from previous years. The person who started the discussion seems to have left WT since a vote on Pinyin as you can see here Special:Contributions/Vaste. I've never touched pitch myself but it would be awesome to have more of it on here. --Haplology (talk) 06:35, 26 March 2013 (UTC)[reply]

Cool, thanks for the links, Haplology. I'll have to read those later (hopefully today).

And yeah, pitch is pretty important in JA for distinguishing words that would otherwise be pure homophones. Almost no Japanese dictionaries seem to include this information, be they JA<>JA or JA<>something else, making it difficult for learners to get a handle on. Context can handle a lot of that for us non-native speakers, but I have in the past noticed odd looks on folks' faces when I've used the wrong pitch and they have to re-run what I've said through their internal parsers to make sense of it. After getting a copy of NHK's official pitch accent dictionary, specifying the standard for Japanese broadcasters, I would really like to get that info into as many entries here as possible. Figuring out the best format for the template is part of that. :) -- Eiríkr Útlendi │ Tala við mig 15:19, 27 March 2013 (UTC)[reply]

So after reading those various threads that Haplology linked to, and the threads linked therefrom, I found the following:

Various dictionaries and other sources use a number notation.

Most commonly, this appears to be the number of the syllable after which there is a w:downstep. So for 殊に (koto ni), this is marked as [1] in w:Daijirin, and for 言葉 (kotoba), this is marked as [3] in Daijirin. Words with no downstep are either not marked with any number, or marked with [0] if there are any homophones that do have a downstep. Words with multiple possibile pitch contours are marked with the most common number first, such as [2][0] for 陽炎 (kagerō).

Some dictionaries use a diagonal arrow notation. This includes some entries on the JA WT.

This notation uses ↗ just before the first kana with higher pitch, and ↘ just before a downstep. Note that these characters are also used in IPA for global rise and global fall, which are specifically defined as not being used to distinguish words (see w:Intonation_(linguistics)), and thus should probably not be used here on the EN WT to mark Japanese pitch accent.

Some dictionaries use a vertical arrow notation.

This notation mostly just uses ꜜ, the IPA character for a downstep, just before a downstep. Some sources might use the IPA w:upstep character ꜛ just before where the pitch rises.

Some dictionaries and other sources use accent diactrics.

This uses the gràvè àccènt to mark low tone, and the ácúté áccént to mark high tone. Downstep seems to be marked either by a change from acute to grave, or (especially if the downstep is at the end of the word) by use of the downstep arrow ꜜ.

Japanese accent dictionaries use an overline notation to mark high tone, and a hooked overline notation to mark the downstep where high tone ends. This is true both of NHK's official broadcasting accent dictionary, and Sanseido's accent dictionary (sample page here).
IPA provides both accent diactrics and w:tone letters.

Tone letters look like ˥ ˦ ˧ ˨ ˩, and for Japanese purposes, we might only need the ˦ and the ˨.

Between the two, I think tone letters are much more usable, as Japanese phonetic IPA already makes use of other over-letter diacritics (mostly just the tilde to mark nasals, but also the diaresis for some vowel sounds), and these don't combine very legibly with accent diacritics. Tone letters are given just to the right of each mora, and thus don't interfere with any other over-letter diacritics.

Ultimately, I think we should use overline and hooked overline notation for kana, and IPA with tone letters and the downstep arrow for complete phonetic IPA information. I'd like to avoid using any romaji at all in the pronunciation section. This would allow for combining the IPA and pitch accent into one bulleted line, rather than the two I've been using so far as seen at 墓地.

Would that be acceptable to other editors? Does anyone feel strongly about using another notation, as well as or instead of the above?

(Note that this is so far all for the "standard" accent used in Japanese broadcasting, based on pitch accent patterns in Tokyo. If folks have access to resources describing other Japanese pitch accent patterns, please chime in.) -- Eiríkr Útlendi │ Tala við mig 22:08, 27 March 2013 (UTC)[reply]

Personally I agree and much prefer the overhead and overhead hook notation in addition to IPA notation, but maybe that's partly because I'm most familiar with overhead notation. The biggest danger with second language learners is that they will put way too much pitch variation into their pronunciation and this is why Japanese spoken in Hollywood is painful to the ears. *shudder* The overhead line format drives home the fact that it's pretty much flat, especially to foreign ears. --Haplology (talk) 00:52, 28 March 2013 (UTC) (PS I mean spoken by actors who clearly memorized some Japanese for their lines but have no other knowledge of it, not ordinary people living in Hollywood) --Haplology (talk) 02:31, 29 March 2013 (UTC)[reply]

I just got the iPod app version of the book you mentioned ($34.99 + it has sound clips) and after using it a while, I think it might even be better to directly combine overhead/overhook and IPA notation as I think they work wonderfully together. I also think that the tone letters may prompt "Hollywood Japanese pronunciation" unless very little variation is visualized (i.e. keep to the top two steps). Syllables should be distictly denoted for clarity. As for clarifying what the overline/overhook means, I believe that can be provided in the Japanese phonology page. Here's a visual example of what I'm thinking: IPA^(key): [a̠.ɺ̠a̠.ɰᵝa̠.sɯ̥ᵝ] replace : with ː and g with ɡ, invalid IPA characters (<=":1;:;g:1;"></>) --Soardra (talk) 20:02, 6 April 2013 (UTC)[reply]

Alpha bar not working again

The alpha bar extension (I don't know its real name) isn't working again: it never appears at all. I mean the horizontal row of previous/next entries that used to appear above the headword, delayed after page load, via JavaScript. So if you're at nut, you might see a bar of links like nuclear - nuclearity - nugget - nut - nuts - nutty. Equinox ◑ 12:20, 26 March 2013 (UTC)[reply]

See #Random entry (by language) broken. I believe Hippietrail is also the author. Dakdada (talk) 16:50, 26 March 2013 (UTC)[reply]

Curses. That was one of my favourite features. I wish we'd centralise our scripts so they were a permanent part of the wiki. Equinox ◑ 10:00, 28 March 2013 (UTC)[reply]

That's partly the purpose of the migration to Wikitech. Dakdada (talk) 14:12, 28 March 2013 (UTC)[reply]

Context tags: selective visibility and categorization

Lua-ization may bring us opportunities to make context tags better serve the diverse needs and tastes of our user and contributor populations. In the recent past, we have had disagreements about the desirability of topical (eg tagging all senses of noun, presumably widely understood as grammar) rather than grammatical, regional, timeliness (obsolete etc), and register tags (which would leave at least one sense of noun untagged). Currently there is a dispute about the use of relatively obscure (from a user perspective) linguistic terms (ergative, ambitransitive) in context tags. I have previous wished to be able to classify and tag at the sense level various terms using grammatical and semantic categories and labels. I suspect that the complexity and performance issues of {{context}} would have made it difficult to execute selective display of context tags. In any event, at present, it seems silly to try to force such a thing on {{context}} when, according to Ruakh, it is one of the templates that would most benefit from Lua.

There are questions that will have to be raised at BP, such as:

Do others agree that some context tags are best not imposed on normal users?
Could we agree on some way of visually distinguishing topical vs usage context categorization or should be just enable selective suppression of topical tags?

For tags best not imposed on normal users, there are many questions about how to implement selective display. Conceivably, as I understand the capabilities of CSS, there are virtually no limits on what could be selectively displayed, while being hidden by default from normal users. What would be required is more custom CSS. But gadgets could allow some common subsets (eg, all tags in a given language or set of languages) to be displayed.

Given my incredibly superficial grasp of the technologies, I start by raising the possibility here so that technical considerations could be reflected in any BP trial balloon or proposal. DCDuring TALK 19:49, 26 March 2013 (UTC)[reply]

I think our nomenclature surrounding these templates is misleading. Best to ignore the term “context” here. “Topical” is also easily misconstrued.

There are two kinds of labels: restricted-usage and grammatical. The subject-area labels are a type of usage label indicating a technical term’s or sense’s use chiefly within a specialty, or having a different meaning within a specialty, or having a meaning prescribed by some authority. Arguably noun, the common word for a well-known concept, should not be labelled “grammar”.

CSS and gadgets could certainly be used the way you propose, although I dislike the WT:PREF interface used to make such preferences inaccessible.

But why hide such labels at all? We make masculine, feminine, neuter, singular and plural less obtrusive by abbreviating them m, f, n, s, and pl. We could do the same for the more obscure grammatical labels, and as a result I would learn a thing or two. —Michael Z. 2013-03-26 20:28 z

Why not finally (gasp!) migrate to {{label}}? -- Liliana • 20:31, 26 March 2013 (UTC)[reply]

@Mzajac: I think we need to hide some labels that use terms not part of the general of educated non-specialist users to avoid intimidation and needless questions. Also, Ruakh pointed out that some dictionaries seem to use labels in a way that reflects topic rather than usage context. I've thought our problem is that we have no good means to distinguish a label that indicates a topic vs a usage context.

@Liliana: I either follow the herd or would have to depend on documentation. How does that family of templates work? DCDuring TALK 22:21, 26 March 2013 (UTC)[reply]

Yeah, what is all that!? I proposed the idea ages ago, but I didn’t get the impression there was any enthusiasm for it. —Michael Z. 2013-03-26 22:37 z

My memory for such things isn't so good. In any event, rereading that kind of thing now would probably have a very different effect on me now. I'll see what I can find. What proposals have you made? DCDuring TALK 23:39, 26 March 2013 (UTC)[reply]

I was drawn away from editing by other things then, and late forgot how much discussion this prompted. Must catch up on this. —Michael Z. 2013-03-27 03:58 z

Edit request

It would be nice if someone could complete this edit request. Norwegian uses nested translations, and the language code no should therefore be rejected. --Njardarlogar (talk) 08:05, 29 March 2013 (UTC)[reply]

Hang on, I don't think there's a consensus not to use no yet, is there? Mglovesfun (talk) 09:59, 29 March 2013 (UTC)[reply]

You can't both nest and not nest for the same macro language; that won't look any good. Norwegian has been nested since, er, 2009 or something. --Njardarlogar (talk) 10:20, 29 March 2013 (UTC)[reply]

Didn't say I opposed it, just it needs more discussion. Mglovesfun (talk) 19:25, 29 March 2013 (UTC)[reply]

I view that as a largely separate topic. You have to mark Norwegian entries as either Nynorsk or Bokmål one way or the other, and the code no tells nothing on its own. Nesting, which has been implemented for a while, solves this problem elegantly. Remember that if so desired, the {{t}} template could rewritten so that nb and nn both pointed to ==Norwegian==. By removing no from the translations, we ensure that all translations are marked properly. Right now, quite a few are not. --Njardarlogar (talk) 19:33, 29 March 2013 (UTC)[reply]

Question About Spambots

We seem to be getting a good number of spam user pages lately. I understand the ones that slip in a url into innocuous-looking text: the appearance of the link in a high-traffic site like ours makes it seem more important to Google and means it gets listed earlier in the list of results than it would otherwise. I'm a little puzzled by the ones that do the same, but without the url: "blah, blah, blah, [phrase including a brand name], blah, blah, blah", Although that phrase may match the text part of the link in their other spam, it doesn't actually point to the site being promoted. How is this worth spending their bot's time on it? Just curious. Chuck Entz (talk) 01:56, 30 March 2013 (UTC)[reply]

I asked Amgine about this on IRC once and I think he said that combinations of unlinked terms on a high-traffic Web page are still capable of influencing Google's PageRank algorithms. Equinox ◑ 13:11, 30 March 2013 (UTC)[reply]

JavaScript question

How do I check in JavaScript whether the current page exists? I.e. if I am on sdfsdfsdfsfdf I want to know that it does not exist/is not created. --Njardarlogar (talk) 17:08, 30 March 2013 (UTC)[reply]

if (wgArticleId) { current page exists } else { it doesn't exist } This, that and the other (talk) 09:08, 31 March 2013 (UTC)[reply]

Just what I was looking for. Thanks. --Njardarlogar (talk) 09:38, 31 March 2013 (UTC)[reply]

I believe we're actually supposed to write mediaWiki.config.get('wgArticleId') rather than simply wgArticleId; as I understand it, the latter is deprecated, and will eventually be removed. —Ruakh_TALK 19:13, 31 March 2013 (UTC)[reply]