Module talk:links

test cases: Module:links/testcases

Alt text

Latest comment: 11 years ago6 comments2 people in discussion

What should language_link do with the alt-text if the main text already contains links? I presume that if given something like [[text|tēxt]], it shouldn't actually use the alt text, just add the language name. So how should it handle this? —CodeCa t 21:29, 29 March 2013 (UTC)Reply

Yeah, I made a change for that. --Z 22:47, 29 March 2013 (UTC)Reply

Your change doesn't actually work right. Consider what it would do if text were "[[word|wōrd]] is [[this]]". —CodeCa t 22:56, 29 March 2013 (UTC)Reply

Yeah, forgot that, fixed now, I think. --Z 23:27, 29 March 2013 (UTC)Reply

What should we do in cases like {{l|en|a [[text]]|tēxt}}? Current code will return a [[text#English|tēxt]]. I think value of alt shouldn't affect output in this case either. If we create more links, all of them will have "tēxt" as their link title. --Z 02:50, 31 March 2013 (UTC)Reply

The alt text should only really be used if there are no links in the text. If there are, it should be ignored, so it should give a [[text#English|text]]. —CodeCa t 03:51, 31 March 2013 (UTC)Reply

Annotated link

Latest comment: 11 years ago7 comments2 people in discussion

I made this a function that doesn't receive a frame on purpose. What if another Lua function wants to make a link? It would be a bit silly if it had to call the template... —CodeCa t 14:51, 30 March 2013 (UTC)Reply

Ok, feel free to revert. I thought the function is supposed to be used only in l and term. (But we need to add another function which take frame table as argument). --Z 15:02, 30 March 2013 (UTC)Reply

Well, my intention was that this module be used for anything related to making links. {{l}} and {{term}} aren't the only ones that make links, we also have the form-of templates (which really work like {{term}} because they also have "annotations", but with some extras), and {{head}} which could use this module. Provided that it's made in such a way that nothing is written to work only for {{l}}, at least. —CodeCa t 16:32, 30 March 2013 (UTC)Reply

But they are still different, even l and term don't work exactly the same, and the current code is more similar to term -- term is like this: <word> (<tr>, "<gloss>"), and l: <word> (<tr>) <g> ("<gloss>"). --Z 17:02, 30 March 2013 (UTC)Reply

That's why I said it may not be a good idea to try to mimic the templates too closely. Do we actually want them to be different in that way? I think they should both work like {{term}} does, with only a single set of brackets instead of two. —CodeCa t 17:16, 30 March 2013 (UTC)Reply

Do we need community consensus to make such change? The change is minor and probably not important, but the tempate is widely used... --Z 17:23, 30 March 2013 (UTC)Reply

I don't think so. It doesn't actually change anything that is really important, just a small cosmetic detail, which can easily be changed back (but I doubt anyone would disagree). —CodeCa t 19:05, 30 March 2013 (UTC)Reply

"main" and arguments

Latest comment: 11 years ago6 comments2 people in discussion

What is this function actually meant to do? What are the parameters for? Could it have a better name? Also, the way the parameters are assigned is just wrong; it creates global variables which should be avoided, especially in this case. A better way of doing it would be:

local args = frame:getParent().args
local gloss = args["gloss"] or ""

—CodeCa t 19:11, 30 March 2013 (UTC)Reply

We need a function like this to make the module accessible from wiki pages. We'll change the name if we feel the need to define more functions for creating links in future, e.g. links in a format that is different from that of term and l (see Template:fa-conj for example, in which transliterations are in a new line). Regarding variables, yeah that's better, I'll change it. --Z 19:22, 30 March 2013 (UTC)Reply

Maybe it should just be called "template_l" so that we know it's just for that template? —CodeCa t 19:25, 30 March 2013 (UTC)Reply

Ok, we may change that to main if decide to use that for term, etc. too. Regarding variables, I think the current form is better. By applying your change, all variables will be always True, because they are either a text, or an empty string, "", both of which are True. Then we can't use expressions like "(alt or text)" --Z 19:31, 30 March 2013 (UTC)Reply

Yes, but there is an advantage too. In templates, you can insert a parameter that might be empty, and if it's empty, no text is inserted there. In Lua, if you try to insert nil into a text, you get a script error. So we will have to decide which is more convenient. In any case, the current form isn't better, it still needs to be changed because it uses global variables. —CodeCa t 19:35, 30 March 2013 (UTC)Reply

But does that that really happen? So far, the code is written in a way that before adding any variables to the output text, it checks if it's not nil. (except when both text and alt are nil, but I think raising an error is actually appropriate in this case) --Z 19:47, 30 March 2013 (UTC)Reply

g

Latest comment: 11 years ago5 comments2 people in discussion

Aren't we going to support parameters for gender and number or what? --Z 17:36, 31 March 2013 (UTC)Reply

{{l}} supports it, so I suppose we'll have to. I don't really like the idea of putting grammatical information into a template like this, but for now we should focus on making the existing template work so we can convert it. We can discuss actual changes later on. —CodeCa t 18:25, 31 March 2013 (UTC)Reply

Also for that reason, please do not add extras like automated transliterations or removal of macrons. Those are not in {{l}} so they shouldn't be in this either, not until after we have converted {{l}} and are sure that it works. —CodeCa t 18:28, 31 March 2013 (UTC)Reply

But then again, it's the best time to perform any planned changes, because we are rewriting the code. --Z 19:01, 31 March 2013 (UTC)Reply

It would be if we could start from scratch. Unfortunately, we can't, so we have to incorporate backwards compatibility into our plans. —CodeCa t 19:20, 31 March 2013 (UTC)Reply

General linking module?

Latest comment: 11 years ago8 comments2 people in discussion

If this module is going to be used as a general module for linking, then I think it should include the functionality provided by {{wlink}}, {{wlink2}}, and {{makelink}}. Also, most or all of export.template_l should be moved into more general functions. --Yair rand (talk) 22:27, 7 April 2013 (UTC)Reply

What extra functionality do those templates provide exactly? And the problem with moving the functionality into general functions is that currently there is a variety of parameter names for different templates. The purpose of export.template_l is to "convert" these parameters into a standard format, and then forward it to a more general function. I don't know if it does that well enough, but that is the idea. —CodeCa t 22:30, 7 April 2013 (UTC)Reply

{{wlink|[[test]]}} = Template:wlink, {{wlink|test}} = Template:wlink. --Yair rand (talk) 22:35, 7 April 2013 (UTC)Reply

I think the module already does that currently. But you can try it to make sure. —CodeCa t 22:48, 7 April 2013 (UTC)Reply

I just checked. It does not. It should probably also be able to support {{l-self}}, by the way. --Yair rand (talk) 01:13, 9 April 2013 (UTC)Reply

Oh, I see what happened. It did support it at one point, but to avoid having too many potential problems to deal with when changing over {{l}}, it was removed again. And I think that's a good idea for now. —CodeCa t 01:24, 9 April 2013 (UTC)Reply

? You think that additional features should be added after the module is already in use on millions of pages, instead of before? That seems kind of backwards to me. Reworking it to add the rest of the features after it's in use sounds more likely to cause some problems. --Yair rand (talk) 01:41, 9 April 2013 (UTC)Reply

It makes more sense to me if we limit the number of things that can go wrong and need to be fixed. I would rather make sure that the template works before we start adding more things to it rather than after. It also means that when we do add something new and it breaks things, we immediately know what caused it and what to revert. If we try to do everything in one go, we might be swamped with problems and have no choice to roll everything back and start over. It's much easier to make small incremental changes to something you know that works, rather than adding things to something you're not even sure worked before. —CodeCa t 01:47, 9 April 2013 (UTC)Reply

altForm from WT:EDIT

Latest comment: 11 years ago14 comments5 people in discussion

In prepare_title, la-utilities is imported if the language is Latin. Rather than setting up a whole bunch of unique functions for every language, perhaps we should just copy over the altForms table from WT:EDIT:

var altForm = {
		ang: {from:"ĀāǢǣĊċĒēĠġĪīŌōŪūȲȳ", to:"AaÆæCcEeGgIiOoUuYy", strip:"\u0304\u0307"}, //macron and above dot
		ar: {strip:"\u064B\u064C\u064D\u064E\u064F\u0650\u0651\u0652"},
		fa: {strip:"\u064B\u064C\u064D\u064E\u064F\u0650\u0651\u0652"},
		ur: {strip:"\u064B\u064C\u064D\u064E\u064F\u0650\u0651\u0652"},
                chl: {from:"ÁáÉéÍíÓóÚú", to:"AaEeIiOoUu",strip:"\u0304"}, //acute accent
		he: {strip:"\u05B0\u05B1\u05B2\u05B3\u05B4\u05B5\u05B6\u05B7\u05B8\u05B9\u05BA\u05BB\u05BC\u05BD\u05BF\u05C1\u05C2"},
		hr: {from:"ȀȁÀàȂȃÁáĀāȄȅÈèȆȇÉéĒēȈȉÌìȊȋÍíĪīȌȍÒòȎȏÓóŌōȐȑȒȓŔŕȔȕÙùȖȗÚúŪū",
			 to:"AaAaAaAaAaEeEeEeEeEeIiIiIiIiIiOoOoOoOoOoRrRrRrUuUuUuUuUu",
			 strip:"\u030F\u0300\u0311\u0301\u0304"},
		la: {from:"ĀāĒēĪīŌōŪūȲȳ", to:"AaEeIiOoUuYy",strip:"\u0304"}, //macron
		lt: {from:"áãàéẽèìýỹñóõòúù", to:"aaaeeeiyynooouu", strip:"\u0340\u0301\u0303"},
                nci: {from:"ĀāĒēĪīŌōŪūȲȳ", to:"AaEeIiOoUu",strip:"\u0304"}, //macron
		ro: {from:"ŞŢşţ", to:"ȘȚșț"},
		ru: {strip:"\u0300\u0301"},
		uk: {strip:"\u0300\u0301"},
		be: {strip:"\u0300\u0301"},
		bg: {strip:"\u0300\u0301"},
		mk: {strip:"\u0300\u0301"},
		sh: {
			from:"ȀȁÀàȂȃÁáĀāȄȅÈèȆȇÉéĒēȈȉÌìȊȋÍíĪīȌȍÒòȎȏÓóŌōȐȑȒȓŔŕȔȕÙùȖȗÚúŪūѝӣ",
			to:  "AaAaAaAaAaEeEeEeEeEeIiIiIiIiIiOoOoOoOoOoRrRrRrUuUuUuUuUuии",
			strip:"\u030F\u0300\u0311\u0301\u0304"
		},
		sr: {
			from:"ȀȁÀàȂȃÁáĀāȄȅÈèȆȇÉéĒēȈȉÌìȊȋÍíĪīȌȍÒòȎȏÓóŌōȐȑȒȓŔŕȔȕÙùȖȗÚúŪū",
			to:"AaAaAaAaAaEeEeEeEeEeIiIiIiIiIiOoOoOoOoOoRrRrRrUuUuUuUuUu",
			strip:"\u030F\u0300\u0311\u0301\u0304"
		},
		sl: {from: "áÁàÀâÂȃȂȁȀéÉèÈêÊȇȆȅȄíÍìÌîÎȋȊȉȈóÓòÒôÔȏȎȍȌŕŔȓȒȑȐúÚùÙûÛȗȖȕȔệỆộỘẹẸọỌəł",
			 to: "aAaAaAaAaAeEeEeEeEeEiIiIiIiIiIoOoOoOoOoOrRrRrRuUuUuUuUuUeEoOeEoOel",
			 strip: "\u0301\u0300\u0302\u0311\u030f\u0323"},
		tr: {from:"ÂâÛû", to:"AaUu",strip:"\u0302"},
		zu: {strip_init_hyphen: 1}
	};

--Yair rand (talk) 22:37, 7 April 2013 (UTC)Reply

Because it is a data table, it seems like it would be better suited to Module:languages. It would also be faster, because that module would only be imported once per page, while this module would be imported once per link template. —CodeCa t 22:47, 7 April 2013 (UTC)Reply

Added. I ~~stole~~ adopted the approach used in strip_macrons() in Module:la-utilities, so that there is no need for "from" and "to" fields.

I support moving the data to Module:languages BTW. --Z 13:41, 27 June 2013 (UTC)Reply

What you added doesn't actually work. Unicode has many precomposed characters, which are a single character consisting of the base letter and the diacritic. If you look only for the diacritic, you won't find them. —CodeCa t 14:14, 27 June 2013 (UTC)Reply

I know. It works, all characters are decomposed by mw.ustring.toNFD() before the replacement. --Z 14:29, 27 June 2013 (UTC)Reply

Oh, I didn't know that existed. It seems useful, but is it fast? —CodeCa t 15:18, 27 June 2013 (UTC)Reply

Also, the combining characters are currently so jumbled up in the code that you can't see what they are. Is there a way to fix that? —CodeCa t 15:21, 27 June 2013 (UTC)Reply

Spelling them out in UTF-8, which will at least make editing easier. But otherwise, not much can be done. Keφr 15:42, 27 June 2013 (UTC)Reply

What about putting spaces in between? It would then match on a space as well, but you could always just remove the spaces again before using the string. So doing... string operations on a regex pattern. Why not? :) —CodeCa t 15:48, 27 June 2013 (UTC)Reply

Well, if we are no longer afraid of getting hands dirty with somewhat performance-hitting operations, why not import the Unicode database, specify the characters with an array of their Unicode names, and let a function convert that to a regular expression? Keφr 15:56, 27 June 2013 (UTC)Reply

Of course they should be spelled in UTF-8, but I don't know why they don't work in Lua (I've tried different forms, decimal, hex... none of which worked). --Z 16:08, 27 June 2013 (UTC)Reply

Good job, it works now. --Z 16:21, 27 June 2013 (UTC)Reply

This is magic to me. How can I remove breve marks in Latin words? In "ăquā" it only removes the macron. --Vriullop (talk) 12:38, 30 June 2013 (UTC)Reply

Breve marks should not be used at all. See WT:ALA. —CodeCa t 12:46, 30 June 2013 (UTC)Reply

Genders

Latest comment: 11 years ago8 comments2 people in discussion

What is wrong with Module:gender and number that it can't be used in this module? It works fine for {{nl-noun}} and {{ca-noun}}. —CodeCa t 17:02, 10 April 2013 (UTC)Reply

(1) it separates only with comma, but it should use "and" as well. This can be done using mw.text.listToText(list, ", ", " and "). (2) it doesn't differ between genders and numbers: m, f, sg or m, f and sg should be m and f sg. --Z 17:15, 10 April 2013 (UTC)Reply

...that's all? Really, that's not a problem at all, I don't see why you think it is. The difference is by design, because the way the old genders were made, you couldn't tell whether "m f p" meant "masculine plural and feminine plural" or "masculine singular and feminine plural". You're right though that this isn't quite the same as how {{l}} currently works, so this should be changed by fixing all the entries that use {{l}} with more than one gender. g1=m g2=p should be changed into g=m-p . —CodeCa t 17:59, 10 April 2013 (UTC)Reply

So lets make it backward compatible, if it has dash, it follows your way, otherwise do as Template:l does,

    if gender and gender[1] then
        if mw.ustring.match(gender[1] .. gender[2] .. gender[3], "-") then
            local gen = require("Module:gender and number")
            text = text .. " " .. gen.format(gender)
        else
            text = text .. "&nbsp;" .. frame:expandTemplate{title = gender[1], args = {gender[2], gender[3]}}
    end

then we fix all entries and inform users about the change, and after that we remove the backward compatibility. --Z 03:12, 11 April 2013 (UTC)Reply

Ok, that sounds reasonable. But it's better to do it differently: if the g2= or g3= parameters are specified, use the old method, but if there is only g= or g1= then use the new method. —CodeCa t 12:33, 11 April 2013 (UTC)Reply

Now we should put something like {{#if:{{{g2|}}}{{{g3|}}}|[[Category:Pages with g2 or g3]]}} in Template:l's code to get a list of pages that should be updated by bot. --Z 12:45, 11 April 2013 (UTC)Reply

Yes, and {{{g1|}}} should be in there as well (it's redundant to g=). —CodeCa t 12:52, 11 April 2013 (UTC)Reply

Ok, now we should update {{l}} to use the module, inform users about how they should use gender parameters from now on, and then update the pages in Category:l with g1, g2 or g3 (too lazy to renew my TS account, and my Internet connection is ridiculously slow and limited to update so many pages, anybody interested in doing this part?!) --Z 14:10, 11 April 2013 (UTC)Reply

Script template class inconsistencies

Latest comment: 11 years ago8 comments3 people in discussion

There are a few script templates, such as {{Kore}} and {{unicode}}, that give class names that are different from their template names. Module:links uses the script codes themselves as classes instead of using the script templates, so I think we should probably first edit the script templates to be consistent and see if any problems come up (and make any necessary CSS changes), before starting to use this module. --Yair rand (talk) 11:21, 16 April 2013 (UTC)Reply

We should define their class names in Module:languages. --Z 06:55, 17 April 2013 (UTC)Reply

How would that solve the problem? —CodeCa t 10:22, 17 April 2013 (UTC)Reply

We will be able to check in the module if the class name(s) are different from the script name. --Z 10:33, 17 April 2013 (UTC)Reply

But why would we add all this extra complexity when we can change the class names so that they are always the same instead? Isn't that the more obvious solution? —CodeCa t 10:36, 17 April 2013 (UTC)Reply

It is, but it is also harder to do. --Z 10:42, 17 April 2013 (UTC)Reply

How so? And just because it's harder doesn't mean it shouldn't be done. —CodeCa t 10:58, 17 April 2013 (UTC)Reply

You were right, while replying that I was thinking of an unrelated class-related issue for some reason. --Z 11:04, 17 April 2013 (UTC)Reply

template_l_xform()

Latest comment: 11 years ago2 comments2 people in discussion

language_link() can already do this. --Z 16:21, 16 May 2013 (UTC)Reply

Except it cannot be directly #invoked and the kind of auto-linking it does might be actually undesired. But yes, I actually borrowed some code from it. Keφr (talk) 19:44, 16 May 2013 (UTC)Reply

Testcases failing

Latest comment: 11 years ago3 comments2 people in discussion

I notice that 10 of the tests at Module talk:links/testcases are failing, and five of those are giving actual script errors. Anyone know what happened to cause this? --Yair rand (talk) 21:52, 2 June 2013 (UTC)Reply

The script erros are because of the recent changes to Module:gender and number (spec:find() returns nil, dunno why). Other "failed" tests are actually because of the minor differences in extended, HTML codes; that's ok. --Z 08:14, 3 June 2013 (UTC)Reply

Fixed now. --Z 12:52, 3 June 2013 (UTC)Reply

Holding and accessing language data

Latest comment: 11 years ago1 comment1 person in discussion

I think it makes more sense to create a table that contains informations of language, language = languages[language_code] (the variable for language code is currently lang), instead of having a variable for language code and calling the big table languages everywhere. But in that case, we would need to add a field for language code in language. On the other hand, if we do this change, and alter the lang (string for language code) argument to language (table that contain language info), it will make the job a bit harder for functions that are supposed to be invoked directly from templates and use these functions. --Z 08:26, 24 June 2013 (UTC)Reply

language_link() and script detection

Latest comment: 11 years ago1 comment1 person in discussion

We can improve the script detection feature in a way that it check each term that is going to be linked separately, rather than considering the input as a text that is written in a single script and checking the whole input text once: {{term|[[鳥居]] / [[とりい]] / [[torii]]|lang=ja}} (so that we won't have to write {{term|鳥居|lang=ja}} / {{term|とりい|lang=ja}} / {{term|torii|lang=ja}}). For doing this, we have to merge language_link() to annotated_link(), but this has some disadvantages... --Z 19:27, 24 June 2013 (UTC)Reply

Template:recons with empty first parameter

Latest comment: 11 years ago2 comments2 people in discussion

{{recons}} is sometimes used without the first parameter. This removes the link, and just applies formatting to the bare word. These are now causing script errors, the module apparently requires the word parameter. But it should only be required for {{l}} (I think?) not for {{term}} or {{recons}}. —CodeCa t 20:25, 5 July 2013 (UTC)Reply

That's true, fixed. --Z 20:39, 5 July 2013 (UTC)Reply

diacritic removal for Cyrillic broken

Latest comment: 11 years ago4 comments2 people in discussion

ѝзранити - when this starts generating red links, it doesn't work. --Ivan Štambuk (talk) 14:59, 27 July 2013 (UTC)Reply
- That's strange. Apparently, ѝ is a single composed character, and not и with a combining diacritic. So the replacement didn't catch. I added the composed character ѝ for Serbo-Croatian now, hopefully that fixed it. —CodeCat 15:41, 27 July 2013 (UTC)Reply
  - I think that when you save и with a combining diacritic it automatically gets converted to ѝ by MW. --Ivan Štambuk (talk) 15:49, 27 July 2013 (UTC)Reply
    - Yes, it tries to create composed characters whenever it can. In this case it's just a bit surprising because only ѐ and ѝ exist as composed characters, not а, о, etc. I wonder why they made Unicode that way. —CodeCa t 16:30, 27 July 2013 (UTC)Reply

Merging template_l and template_term

Latest comment: 11 years ago7 comments2 people in discussion

I'm going to merge template_l and template_term functions, like this (tests), template's title (either "l" or "term") should be provided to the module through the parameter "template". Any objections/suggestions? Any suggestions about the name of the new function? --Z 22:39, 27 July 2013 (UTC)Reply

One of the consequences of this is that the forth parameter in {{l}} can do the the job of the horrible "gloss" parameter and makes it deprecated, so we can get the rid of it and make the two templates more similar to each other. --Z 22:50, 27 July 2013 (UTC)Reply

Also, "id" will be available in {{term}}, so will be "lit" and "pos" in {{l}}. --Z 22:52, 27 July 2013 (UTC)Reply

(edit conflict)

"link term" sounds a bit like it's meant to "link a term". "template_l_term" is probably more accurate?
It looks like you have made both {{term}} and {{l}} use either the gloss= parameter or the parameter that follows alt. It's not a problem (so that both are equivalent until we decide which one to keep), but if you add that then you should check any usages of {{l}} that currently already have a 4th parameter, and any usages of {{term}} that already have gloss=, just in case someone provided those parameters in the past and they are still present in entries. We don't want those old mistakes to be interpreted wrongly when those parameters suddenly become valid. The same applies to g= for {{term}} and to pos= and lit= for {{l}}, as these templates did not originally support those parameters.
I'm not quite sure why you are calling {{rfscript}} with "sc or langinfo.scripts[1]". Why not just sc?
I didn't check, but is merging the two the only changes that you made, or did you also make changes to the other functions in the module? —CodeCa t 23:00, 27 July 2013 (UTC)Reply

Maybe even more accurate: "templates_l_term"? As one may think "template_l_term" is supposed to be used in a template called "l term" or "l-term".
One may have already used the forth parameter of l instead of its third parameter, so we should check it, but why should we check "gloss" in term, or "lit" and "pos" in l? Even if someone has already used them, I don't think the user meant anything but what these parameters will mean after our change.
We put the sc = langinfo.scripts[1] part inside the "if (term or alt) then" block, yesterday, so sc may be nil. In this case, the template itself checks for the name of the script, but it is faster to be done directly in the module.
There are other changes, but I only want to replace "template_l" and "template_term" with "link_term" for now. --Z 23:36, 27 July 2013 (UTC)Reply

I think "template_l_term" is better, there is not that much risk of confusion.
It's unlikely, but it can't hurt to check...
Oh yes, I didn't realise. You can't do script detection if there is no text to detect it from.

—CodeCa t 23:52, 27 July 2013 (UTC)Reply

Done. --Z 21:21, 28 July 2013 (UTC)Reply

Lua-izing `{{term}}`

Latest comment: 11 years ago15 comments3 people in discussion

Unlike other major linking templates {{l}} and {{t}}, {{term}} takes the language code through the named "lang" parameter. There were discussions and efforts regarding fixing this, which went nowhere. the The module is ready to be used in {{term}}, though, since we can use the "compat" option. See tests (see the "Actual" column, ignore the "Expected"). BUT: it's the best time to do any other changes and improvements on the template. The easiest way to get the rid of the "lang" parameter is creating a Lua-ized version of {{term}} under another, better title and replace usages of {{term}} (when the "lang" is specified) with the new one. Any thoughts? Here is a discussion about using {{g}} instead of the gender templates, if we do this, we can choose {{m}} (mentioned term; the best title IMO) as the new title, and by this change we can probably use {{mt}} (mentioned term). --Z 21:50, 28 July 2013 (UTC)Reply

Unfortunately, some people to prefer keeping all these old templates around, so there is no consensus to delete them. Apparently, the new situation is too complicated, but I'm guessing it's only complicated because they don't want to adapt? —CodeCa t 21:56, 28 July 2013 (UTC)Reply

Yes, I think people are opposing because they don't get this is an improvement really, maybe we should bring this up at BP again and explain the issue more exactly. Why should we have a four letters long title for such a highly used template, and a one letter one for a template which is normally NOT supposed to be used anywhere, because we have head, t, l, and eventually g? If we could delete {{m}}, we could enjoy our Lua-ized {{term}} with its new title long time ago and put our time and energy that is being wasted to improve it instead. --Z 22:41, 28 July 2013 (UTC)Reply

I'm kind of tired of hitting walls though when I try to improve things and people complain that it's too complicated because they have to type more or because it's just not what they're used to. It's almost an automatic response... —CodeCa t 22:46, 28 July 2013 (UTC)Reply

Why do we even need a template like {{term}}? The only difference from {{l}} is that it italicizes, right? Is it worth having a separate template just for italicizing? Besides, only Latin-script words are italicized. --Vahag (talk) 23:04, 28 July 2013 (UTC)Reply

No, the term should actually be tagged with a CSS class (also it's not possible to italicize only the term and not other parts of the output only using l and ''). --Z 23:58, 28 July 2013 (UTC)Reply

What about {{M}}? We can move it to {{m}} later when we removed the gender templates. --Z 18:37, 30 July 2013 (UTC)Reply

Here are the options we have right now:

Semi-Lua-izing {{term}}, using the compat mode. We have to use "lang" in this case.
Putting the Lua-ized version of {{term}} under another [temporary] title, say {{M}}, and consider {{term}} a deprecated template.
Removing {{m}} and use that.

The advantage of 2 is that we can temporally enjoy using the template without "lang", and having a short title. Later we may decide to change it to another title, e.g. {{m}}. We will run a bot on all pages that have used {{term}} with the "lang" specified, and replace them with the new template. For those which doesn't have "lang", we have two options: (2.1) converting them to the new template by human, who add a language code to the new template, or (2.2) replacing them with {{M/m||...}}, by bot.

By choosing 1, the job will be a bit harder: first we should add "|lang=" by bot to all usages of {{term}} which doesn't have "lang". Then we can change the module in a way that if "lang" is specified (even by having an empty string as value) then use the old parameters (use "lang" instead of the first parameter and so forth), therefore people would be able to use {{term}} by passing the language code to the first parameter, too. Then we can run a bot on all usages of {{term}} to remove "|lang=(...)" and pass its value (which may be empty string) to the first parameter. I said we should add "|lang=" and not "|lang=und" because the latter means undetermined, while in this case they are unspecified; although the current code consider them identical, but we may (should) change the behavior in future. Later we may decide to change it to another title, e.g. {{m}}.

The third option is the best, but we can't choose it without convincing the community and removing the gender templates.

We should choose one of these ASAP. It's ridiculous that we have features but the community still can't use it. --Z 21:29, 30 July 2013 (UTC)Reply

I think I prefer both 1 and 2 for now. We should definitely make the new format available (just like we have {{label}} next to {{context}}), but we can't just suddenly switch over because there are many entries to fix. Providing the language code to 20 thousand entries is going to be a huge task, so I think it may be better if we migrate only the cases that have a language code, leaving the rest as {{term}}. Even if we do have a consensus to use {{m}}, that template is still added to entries by bots (yeah... instead of {{head}}... :/). So if we orphan it, we can't be sure that someone's bot won't add it somewhere and break things in ways we hadn't foreseen. On the other hand, the new {{m}} would require a language code as the first parameter, so if someone's bot starts adding it to entries, it will start triggering script errors and alert us to the problem. —CodeCa t 21:48, 30 July 2013 (UTC)Reply

As I explained, we can migrate ALL cases, if the language is not provided, we can use {{M||text}} and the module will behave exactly like {{term|text}}. --Z 22:10, 30 July 2013 (UTC)Reply

I don't know if I really like that idea. If people can leave out the language in this new template, then they probably will, and the whole problem starts all over again. I'd rather make the conventions for this new template very rigidly set in stone, so that it's clear how people can and can't use it. A script error is as close as it gets to "you're not doing it right", so we should definitely use that. —CodeCa t 22:20, 30 July 2013 (UTC)Reply

We can replace them with {{M|und|text}}. I don't think people will write {{M||text}} or {{M|und|text}} when the language is not undetermined, the main reason that they didn't specified language code for so many {{term}}s is actually because it had a named parameter, compare {{l}} for which users have always specified the language code. --Z 22:30, 30 July 2013 (UTC)Reply

Let's start by converting {{term}} to Lua first. Hopefully that won't give too many problems. We also need to work on {{termx}}, which should be orphaned altogether, but only once {{term}} can support reconstructed languages. —CodeCa t 02:16, 31 July 2013 (UTC)Reply

What's next? I want to bring up the gender templates issue again at BP, let me know if you have any plan about the template. --Z 00:19, 4 August 2013 (UTC)Reply

I think it's more or less complete. I've been orphaning both {{termx}} and {{recons}} in the last few days, and I've been trying to deal with the script errors that have been showing up. That will probably take a day or two. {{compound}} and {{suffix}} have also been converted to Lua. {{compound}} is "nice" as far as code goes, but {{suffix}} is a bit more hackish because I focused more on replicating existing behaviour instead of changing or adding new things. That's for later, when we decide what we want. {{prefix}} and {{confix}} will need to be converted as well, that won't take too long. —CodeCa t 00:23, 4 August 2013 (UTC)Reply

Links to "und"

Latest comment: 11 years ago11 comments2 people in discussion

The current version of the module will happily create links to appendix pages when the language is "und". But this is undesirable. Links to "und" are ok for the main namespace; the section id should be empty then. But for appendix pages, there should really be no link at all. This should be added to language_link but I'm not sure how. The best way seems to me that language_link should just return nil when it's not able to make a link. So language_link("attested", "alt", "und") should give [[attested|alt]] (without a #), but language_link("*reconstructed", "alt", "und") should give alt with no link at all.

I'm not sure what it should do when the term contains embedded wikilinks. What if someone writes: language_link("[[attested]] [[*reconstructed]]", nil, "und")? The most sensible thing to me would be to try to process each individual link the same as it would normally, so that this gives [[attested]] *reconstructed. I suppose that the function should only return nil if it could not create any links at all, so language_link("[[*rec1]] [[*rec2]]", nil, "und") should return nil, but I don't know if that is feasible.

An alternative we could try is to just return the link-less text instead of nil, but that seems to go slightly against the idea of making "links". What would be more useful? —CodeCa t 22:37, 28 July 2013 (UTC)Reply

Why should we link to the term when the lang is und and the term is attested? --Z 22:57, 28 July 2013 (UTC)Reply

Category:Undetermined language that's why. ==Undetermined== is actually a valid language header. But Appendix:Undetermined/ is not valid as far as I can tell. So this language is kind of unique: it can be attested, but not reconstructed, whereas all other languages can be reconstructed, but some can't be attested. There's also another reason. {{term}} is still missing the language code on about 20 thousand entries, and the module currently treats this as "und"... which would then remove the link. I don't think we want that. —CodeCa t 23:06, 28 July 2013 (UTC)Reply

Oh, didn't know that. {{term}} was not a problem though: since it is not used to link to any appendix so far, all we actually need is if term and (compat or lang ~= "und") then

I think texts like attested [[*reconstructed]] is unlikely to appear in the term variable while the lang is und. Editors should be aware that we don't link to reconstructed terms. We have already made linking to reconstructed terms complicated, I'm not sure if this highly unlikely case is worth it to make the code even much more complicated because of it. So we should not call language_link when the have "*" at the beginning. There's a way to always fix this though: replace "[[Appendix:Undetermined/(...)|(...)]]" with the second capture. But I fear if we continue to handle such rare issues like this the code eventually become full of fixes of fixes of fixes... --Z 23:45, 28 July 2013 (UTC)Reply

They're not so rare, though. There are many links with "und" as the code, using {{l}}, {{term}} and {{recons}}. So we can't just ignore it. And the handling of "und" must be done inside language_link, because that function is not used by just {{l}} and {{term}}, other modules also use it. It's better to make the code robust now, rather than regret it later when things start behaving strangely. —CodeCa t 23:57, 28 July 2013 (UTC)Reply

OK, done. Use User:ZxxZxxZ/links/User:ZxxZxxZ/term to test it if you want, or see User talk:ZxxZxxZ/links/User talk:ZxxZxxZ/term, I've added a test for this case. --Z 03:17, 29 July 2013 (UTC)Reply

I also added the "curtitle" argument to language_link, when this is provided, the function doesn't link to current title. --Z 03:19, 29 July 2013 (UTC)Reply

Does that work even with embedded wikilinks? —CodeCa t 10:45, 29 July 2013 (UTC)Reply

Yes (did you see testcases?) {{User:ZxxZxxZ/links|und|[[attested]] .. [[*unattested]]}} -> User:ZxxZxxZ/links, {{User:ZxxZxxZ/links|und|*[[unattested]] .. [[unattested|alt]]}} -> User:ZxxZxxZ/links --Z 16:35, 29 July 2013 (UTC)Reply

It looks good. Can you make the changes to Module:links? —CodeCa t 16:58, 29 July 2013 (UTC)Reply

Done, please improve (wording, etc) / add comments as you see fit. --Z 18:33, 29 July 2013 (UTC)Reply

The format of the annotations

Latest comment: 11 years ago2 comments2 people in discussion

I noticed that we have two different formats for genders, inflections and annotations. Which one we use depends on the template:

{{head}}: term (tr) gender (inflections)
{{term}} and {{l}}: term gender (tr, glosses)
{{t}}: term (tr) gender

I think we should make all of these show the same, but I'm not sure which way. I think the transliteration should come right after the term, but that would look like this on {{term}} when other glosses are also given: term (tr) gender (glosses). And if there's no gender (which is most of the time) then it becomes: term (tr) (glosses). That doesn't look so nice. So what should we do? —CodeCa t 12:47, 1 August 2013 (UTC)Reply

We can split tr and gloss only when we have gender. --Z 21:15, 1 August 2013 (UTC)Reply

Diacritic removal for Lithuanian and Latvian

Latest comment: 10 years ago2 comments2 people in discussion

tumė́ti should link to tumėti, vil̃na should link to vilna and similar. See more on w:Lithuanian accentuation and w:Latvian_language#Pitch_accent. --Ivan Štambuk (talk) 22:59, 15 August 2013 (UTC)Reply

This should really go at Module:languages, that's where it's defined. Which diacritics should be removed from each language? —CodeCa t 11:49, 18 August 2013 (UTC)Reply

Mycenaean Greek italicized

Latest comment: 10 years ago3 comments3 people in discussion

When it shouldn't be, see etymology at: ναῦς. --Ivan Štambuk (talk) 09:39, 18 August 2013 (UTC)Reply

It doesn't appear italic for me? —CodeCa t 11:48, 18 August 2013 (UTC)Reply

It was a minor issue in common.css, fixed it. --Z 14:23, 18 August 2013 (UTC)Reply

"*" before automated transliterations of reconstructed terms

Latest comment: 10 years ago2 comments2 people in discussion

Shouldn't we remove it? --Z 19:50, 20 August 2013 (UTC)Reply

Maybe. I'm talking to Vahag right now about how to treat scripts for reconstructed terms. Latin script is often used for reconstructions even if it's not a native script of the language otherwise. So detect_script should probably do something special with the *, while it should be removed before transliterating. —CodeCa t 20:22, 20 August 2013 (UTC)Reply

broken broken broken

Latest comment: 10 years ago16 comments3 people in discussion

It's broken when used in {{term/t}}. Why don't you do test edits on some test module, instead of live system?! --Ivan Štambuk (talk) 13:45, 26 August 2013 (UTC)Reply

I can't fix it if you don't tell us where it's broken. Got an example? —CodeCa t 13:46, 26 August 2013 (UTC)Reply

See: Appendix:Proto-Slavic/bьrdo. --Ivan Štambuk (talk) 13:52, 26 August 2013 (UTC)Reply

That wasn't caused by any of my recent edits. I fixed it, though. —CodeCa t 14:20, 26 August 2013 (UTC)Reply

It was caused by target2 == export.make_pagename(linktitle, lang) that you added at the end of the "core" function. Linking to appendix with an alt which doesn't start with "*" is not allowed now (see the tests 10, 11, 12 of Module talk:links/testcases). Is it a good idea? --Z 14:28, 26 August 2013 (UTC)Reply

I didn't realise that make_pagename would trigger errors like that. I don't think it's necessarily a bad idea either though, because we caught the above "mistake" thanks to it. Otherwise, the * would have been missing from the displayed link, and there wouldn't have been an indication that it's reconstructed. —CodeCa t 14:42, 26 August 2013 (UTC)Reply

Did you see the testcases that I mentioned? In past you discussed with me about changing the module is such a way that works with cases like {{l|sla-pro|*[[dьnь]] [[dьnь]]}}, but they won't work after your change... --Z 14:56, 26 August 2013 (UTC)Reply

Ok, that is a bit of a problem but I'm not quite sure how to fix it. The purpose of the code I added is to see whether the alternative form in the link is redundant, because the page being linked to would be the same anyway once diacritics are removed from it. In other words, it's meant to tell us when {{l|xx|term|alt}} can be safely converted into {{l|xx|alt}}. —CodeCa t 15:32, 26 August 2013 (UTC)Reply

Easy to fix: if the language is a reconstructed one, pass "*" .. linktitle (instead of linktitle) to make_pagename. --Z 10:53, 27 August 2013 (UTC)Reply

I'll leave it for now, though. I just found a second entry that was missing the * in the alt form. So it can be useful. —CodeCa t 11:04, 27 August 2013 (UTC)Reply

OK, but putting them in a category is a more appropriate way to find these mistakes, instead of returning script error. --Z 12:36, 27 August 2013 (UTC)Reply

I tried to implement your suggestion above, by making it add * to the linktitle before calling make_pagename. But it doesn't seem to be working. —CodeCa t 11:19, 28 August 2013 (UTC)Reply

The categories are causing this, because they are being added at the middle of the process of searching for embedded links (see this test and the value of the target variable mentioned in the error message which is related to the third capture of the regex). So the solution is inserting the categories in a table and adding them to the text just before returning the text (at the end of language_link), like what you've done in another module, headword or translations if I'm not mistaken. --Z 15:54, 28 August 2013 (UTC)Reply

That fixed it, thank you! —CodeCa t 16:00, 28 August 2013 (UTC)Reply

I noticed that now, any embedded links to reconstructed terms that don't have an alt form embedded in the link will be marked as redundant: {{l|gem-pro|*[[dagaz]]}}. It's not a serious issue, but worth noting. —CodeCa t 16:09, 28 August 2013 (UTC)Reply

I haven't tested it but that's probably happening only for links like {{l|gem-pro|*[[dagaz]]}} (but not {{l|gem-pro|[[*dagaz]]}} etc.) because *[[dagaz]] will be turned into [[*dagaz|dagaz]] at the "fix for linking to unattested terms (...)" part, which itself becomes [[*dagaz|*dagaz]] in core. We can fix this by checking if new is equal to target AND new is not equal to linktitle, after the line new = export.make_pagename(new, lang), in this case, it shouldn't be marked as redundant. --Z 16:27, 28 August 2013 (UTC)Reply

Link suffixes are not linked

Latest comment: 10 years ago5 comments3 people in discussion

See: запахи следοв (zapaxi sledοv). The declension suffixes should be linked. I think this module is responsible for it. Keφr 09:11, 12 September 2013 (UTC)Reply

If you want them to be linked, then you need to include them in the link... —CodeCa t 11:15, 12 September 2013 (UTC)Reply

No, I need n— wait, it only works for ASCII characters? Apparently: pyjům vs pyjem. Stupid. Keφr 11:45, 12 September 2013 (UTC)Reply

That is a problem with the software then, not with the module. But you can work around it anyway. —CodeCa t 11:50, 12 September 2013 (UTC)Reply

It behaves differently in different language editions of the wikis. --Z 12:17, 12 September 2013 (UTC)Reply

Newlines should be stripped

Latest comment: 10 years ago6 comments3 people in discussion

diplomacy

Please fix this. Keφr 14:46, 16 September 2013 (UTC)Reply

Fixed it for you.[1] ;) --Z 15:27, 16 September 2013 (UTC)Reply

OK seriously: I think it's not Module:links' job to strip final/initial new lines (nor this may be always wanted). If someone wants them to be stripped, s/he should use named parameters (|2=, in this case), otherwise, if the new line is added mistakenly, I think it should be fixed from the page code, the page in which the template is used. --Z 15:28, 16 September 2013 (UTC)Reply

Okay. I was not aware of this trick. Well, sometimes the newline in the page markup is quite convenient and aids readability (in inflection templates, for example), so I would rather have it removed after being passed to the template. And I thought this would be the easiest way. Keφr 15:58, 16 September 2013 (UTC)Reply

You should probably escape newlines in templates by wrapping them in comments. —CodeCa t 16:06, 16 September 2013 (UTC)Reply

When invoking declension table templates in entries, it would defeat the purpose of having newlines in the first place. So no. Keφr 16:23, 16 September 2013 (UTC)Reply

interwiki links should not be redirected

Latest comment: 10 years ago3 comments2 people in discussion

See Heisenberg uncertainty principle. Keφr 18:09, 17 September 2013 (UTC)Reply

I don't understand what you mean. —CodeCa t 18:47, 17 September 2013 (UTC)Reply

In the first link in the headword, the link to Wikipedia is redirected to a section named "English". Does little harm, but well… just feels kinda unclean. Keφr 18:59, 17 September 2013 (UTC)Reply

Punctuation

Latest comment: 10 years ago1 comment1 person in discussion

There are many more international punctuation symbols to be removed (e.g. Tibetan), I have added for removal a few more:

text = gsub(text, "[؟?¿!¡;՛՜ ՞ ՟？！।॥။၊]$", "")

There are all used at the end of sentences (except for Spanish inverted ¿ and ¡), so they shouldn't cause problems for other languages. --Anatoli ^{(обсудить}/^вклад) 00:44, 24 October 2013 (UTC)Reply

|face=

Latest comment: 10 years ago6 comments3 people in discussion

I think {{l}} should accept a |face= parameter, to be used for example in headword templates (which may need to set |face=bold). template_l_term will need to be extended. Keφr 19:52, 5 November 2013 (UTC)Reply

I don't see why. What's wrong with three apostrophes? —CodeCa t 20:22, 5 November 2013 (UTC)Reply

They would also cover the transliteration, if supplied to (or generated by) {{l}}, and if I put them inside the template (as in, {{l|und|'''[[foo]]'''}}), it breaks some of the functionality provided by the template (like handling already-marked-up links; although it is somewhat doubtful that headwords would or should take advantage of it). Are you planning on staying, or am I asking too soon? Keφr 22:11, 5 November 2013 (UTC)Reply

There was recently a discussion about transliterations for inflected forms. I think the practice is not to add them. You can tell {{l}} to not show a transliteration by using tr=-. So this will work: '''{{l|xx|something|tr=-}}'''. —CodeCa t 23:15, 5 November 2013 (UTC)Reply

Template:yi-noun, Template:yi-proper noun and Template:yi-adj say otherwise. Keφr 08:45, 7 November 2013 (UTC)Reply

Let's not have templates enforce policies. If it is decided for a particular language to show transliterations for inflected forms, the templates should be able to handle that. The same applies to {{head}}, I had to use an ugly workaround for אךׄ and אכׄת. --Wiki Tiki 89 14:31, 7 November 2013 (UTC)Reply

Link alt form tracking/redundant, Link alt form tracking/redundant/ru

Latest comment: 10 years ago17 comments3 people in discussion

Is this piece of code still required? I've asked CodeCat but got no reply here. Every single Russian entry gets added to these categories (Category:Link alt form tracking/redundant and Category:Link alt form tracking/redundant/ru) when e.g. genitive sg and nominative pl (nouns) are added to the headword.

if target == new then
	tracking = tracking .. "[[Category:Link alt form tracking/redundant]][[Category:Link alt form tracking/redundant/" .. lang .. "]]"
end

--Anatoli ^{(обсудить}/^вклад) 02:54, 6 December 2013 (UTC)Reply

It's not required, but still useful for tracking down cases where the second parameter isn't needed anymore. So I would prefer to keep it. Are you sure it's the headword that's doing it though? I thought it was the inflection tables, but they can be fixed fairly easily. —CodeCa t 03:16, 6 December 2013 (UTC)Reply

It can be both. See зуб мудрости. I don't know what's causing it but there are too many entries affected. If you're planning to work on it, please say so, otherwise it's a bit annoying to have correctly formatted entries added to categories nobody checks. --Anatoli ^{(обсудить}/^вклад) 03:26, 6 December 2013 (UTC)Reply

The inflection templates can be fixed fairly easily. In the inflection templates, you can see that often the "form" is a wikilink, with both a page name and a display form. You can remove the page name part (which is the same as the display form without the accents, hence the redundancy), and then remove the link [[ ]]'s as well. So [[a|á]] becomes just á. It would help a lot if you could do this. I will look at the headword templates, but can you give an example where it occurs? зуб мудрости is not really a good example because the problem is in the linked display form, which has a redundant piped link. I've fixed it now: diff. —CodeCa t 03:32, 6 December 2013 (UTC)Reply

Opps. I knew this type of links causes issues: зуб, I forgot to remove the pipe. Could you explain what you mean by removing a page name part? Do you mean from declension templates? Is {{ru-noun-1}} an example, which doesn't have a page name part? --Anatoli ^{(обсудить}/^вклад) 03:40, 6 December 2013 (UTC)Reply

Yes, that template has already been fixed, so the entries that use it don't appear in the tracking categories. I think {{ru-noun-3-а}} is the first in the category of templates that hasn't been changed. You will probably notice that after you fix the templates, some of the parameters are actually no longer used anywhere by the template. This is ok; it just means that the templates and the entries that use them will need to be looked at and fixed to change the parameters around, but you don't need to do that unless you want to (you would need a bot to make it easy). —CodeCa t 03:47, 6 December 2013 (UTC)Reply

I've made the changes to {{ru-noun-3-а}} now, to show what needs to be done: diff. I hope that helps. —CodeCa t 03:53, 6 December 2013 (UTC)Reply

I see. It does help, thanks. I'll try to do it. --Anatoli ^{(обсудить}/^вклад) 04:09, 6 December 2013 (UTC)Reply

I think we should eliminate these sorts of pseudo-cleanup categories, where there's absolutely nothing wrong with the page. If CodeCat wants to spend her time pseudo-cleaning-up such pages, I guess that's fine, but the category confuses other editors into thinking they're doing something wrong. —Ruakh_TALK 07:56, 6 December 2013 (UTC)Reply

I personally don't mind having these categories for entries I work with as long as they are understood and there's some action plan. I agree that we could use some prior discussion but this change was kind of expected with the work CodeCat was doing with the Russian declension templates. --Anatoli ^{(обсудить}/^вклад) 08:47, 6 December 2013 (UTC)Reply

The categories are hidden, so they shouldn't really get in anyone's way, I think? I'm not sure if I understand why people would think there's something wrong if they can't see it. And if they have hidden categories turned on, then... well then that's kind of up to them isn't it? —CodeCa t 14:41, 6 December 2013 (UTC)Reply

Firstly — I've definitely seen these, or their friends, as red categories at the bottom of the page. That means that not only are they not hidden, but they actually stand out more than regular categories. Secondly, even when these categories are properly created and hidden — real cleanup categories are also hidden. Editors who want to help clean up real issues shouldn't be tricked into thinking they should also clean up fake ones. (If you really want a hidden list, I suppose you can use something like pcall(require, 'Module:User:CodeCat/Link alt form tracking/redundant') and Special:WhatLinksHere/Module:User:CodeCat/Link alt form tracking/redundant. It's still worse than nothing, but I think it's at least better than what we have now. Though DCDuring might disagree with me, since that will break the WantedPages list . . .) —Ruakh_TALK 21:57, 6 December 2013 (UTC)Reply

The category was empty at one point, so later changes have repopulated it. We should probably try to empty it out again, starting with the Russian one. Once it's empty, we can get rid of the language-specific subcategories if needed, so they won't appear as red anymore. —CodeCa t 22:35, 6 December 2013 (UTC)Reply

Re: "We should probably try to empty it out again, starting with the Russian one": Why? Why does it matter? What are the disadvantages of these "redundant" entry-names? What are the advantages of removing them? —Ruakh_TALK 04:16, 7 December 2013 (UTC)Reply

I don't know, it just seems neater that way. —CodeCa t 04:18, 7 December 2013 (UTC)Reply

Right: no advantages. If you want to neaten up entries you happen to be editing, or even to seek out entries for neatening, that's fine; and heck, if you want to run a bot to neaten up these entries, O.K., I think that's probably fine. (It's not the very best idea, because of course bot-edits have nonzero risk, but whatever.) But there's really no justification for a cleanup category. Please remove the piece of code that Anatoli refers to. —Ruakh_TALK 04:41, 7 December 2013 (UTC)Reply

Thanks. :-) —Ruakh_TALK 21:59, 7 December 2013 (UTC)Reply

Sindhi diacritics

Latest comment: 10 years ago1 comment1 person in discussion

Sindhi diacritics should work the same way as Arabic, Persian and Urdu, so it should be the same treatment. The translation adder made: لُغَتُ on dictionary#Translations but it should be just لُغَتُ (currently shows in red) with diacritics automatically removed. The tool knew how to link to the entry without diacritics but used "alt=". (I haven't added the transliteration but it's something like "luğatu"). --Anatoli ^{(обсудить}/^вклад) 02:29, 3 February 2014 (UTC)Reply

Korean transliteration

Latest comment: 10 years ago7 comments2 people in discussion

It fails on this module when there is a manual transliteration. There are thousands of translations with manual transliteration and in this case it's desirable because it capitalises proper nouns (romaja is usually capitalised for place names:

Manual: 미얀마 (ko) (Miyanma), 버마 (ko) (Beoma)
Automatic: 미얀마 (ko) (miyanma), 버마 (ko) (beoma)

--Anatoli ^{(обсудить}/^вклад) 05:57, 28 May 2014 (UTC)Reply

Fixed - missing "s" in "annotations". Wyang (talk) 06:05, 28 May 2014 (UTC)Reply

Thanks. Now I see automatic overrides manual, at least for Korean. Is that good? Probably OK, although romaja should capitalise place names. It's NOT OK for languages, which have unhandled exceptions and word stresses are provided in the transliteration but not in the native script, such as Russian. --Anatoli ^{(обсудить}/^вклад) 06:09, 28 May 2014 (UTC)Reply

I think something is better than nothing at all. I've enabled putting a "^" in front of the letter to be capitalised to allow capitalisation for languages whose script has no case distinction. I don't know Russian well but do you think it'd be possible to make a pronunciation module for Russian? Using accent marks, and some extra tricks for irregularities. Wyang (talk) 06:26, 28 May 2014 (UTC)Reply

I noticed that. Yes, I think it's OK. If Korean transliteration is reliable, we can sacrifice the capitalisation or use ^, as you did. Re: a pronunciation module for Russian. Yes, please! It is predictable in 95-99% of cases, there are some variants and exceptions can use phonetic respelling, e.g. сегодня as сево́дня (sevódnja). I can teach you some Russian too, if interested. --Anatoli ^{(обсудить}/^вклад) 06:32, 28 May 2014 (UTC)Reply

Great, thanks. Now that I have the confirmation... If no one wants to take the lead, I might do so, in which case I will have to bombard you with questions. :) Wyang (talk) 06:48, 28 May 2014 (UTC)Reply

I have answered in Module talk:ru-pron. --Anatoli ^{(обсудить}/^вклад) 07:02, 28 May 2014 (UTC)Reply

Linking to reconstructed terms when lang is und

Latest comment: 10 years ago1 comment1 person in discussion

I noticed that the module links to reconstructed terms while lang is "und": {{l|und|*term}} -> *term, as far as I recall we had fixed this before. --Z 13:57, 3 July 2014 (UTC)Reply

Not an actual problem, but curious behavior

Latest comment: 7 years ago3 comments3 people in discussion

I can't think of a situation in which this kind of use would arise and be needed (hence it isn't a problem per se that it doesn't work), but I noticed on one of my user-subpages that {{term|*?|lang=sca-pro}} produces a module error saying "Lua error in Module:links at line 102: The specified language Proto-Siouan-Catawban is unattested, while the given word is not marked with '*' to indicate that it is reconstructed" (even though the word is marked with '*'), while {{term|*??|lang=sca-pro}} works fine. - -sche (discuss) 01:27, 21 August 2015 (UTC)Reply

The last question mark is stripped as a punctuation character. Thus, your first example is really just *, which I guess is a special case to be able to link to asterisks, and the second one links to Appendix:Proto-Siouan-Catawban/? with just one question mark. --Wiki Tiki 89 01:40, 21 August 2015 (UTC)Reply

The question mark shouldn't be stripped if it's the only character in the string, just like * alone is a special case. The issue is that the asterisk is included in the code that removes the question mark, so it thinks the text consists of more than a question mark alone. Perhaps the "proper" solution is for the asterisk to be stripped before passing it to the conversion function. However, this edge case is so specific that it might not be worth the effort. —CodeCa t 00:23, 30 October 2016 (UTC)Reply

Disabling auto-translit

Latest comment: 8 years ago1 comment1 person in discussion

Is there a way to disable auto-translit if the link text equals — or —? KarikaSlayer (talk) 13:55, 6 July 2016 (UTC)Reply

Automatically replacing plain apostrophes with curly apostrophes in link text

Latest comment: 7 years ago13 comments3 people in discussion

Two recent discussions suggest to me that it would be ideal if this template automatically substituted plain apostrophes with a better-looking character in link text. In the Beer parlour (Wiktionary:Beer parlour/2016/October § ASCII vs. Unicode apostrophes in French entries) @Angr describes it as the usual practice to create French entry names with the plain apostrophe but to display the curly apostrophe (right single quotation mark) in headwords. In Wiktionary talk:About Ancient Greek § Symbol to mark apocope, I proposed that a similar thing be done for Ancient Greek headwords.

Anyway, I thought a similar thing should be done for links. It looks like the function makeLangLink, which deals with link text, would be the place to insert the code. I don't have template editor privileges, but I think the code would to perform this replacement would look something like this:

if lang:getCode() == "fr" or lang:getCode() == "grc" then
	link.display = mw.ustring.sub(link.display, "\'", "’")
end

This would hopefully make {{m|fr|d'où}} automatically display as d’où, as if it had been produced by the code {{m|fr|d'où|d’où}}. Similarly, {{m|grc|ἀλλ'}} would automatically display as ἀλλ’ ‎(all'), like {{m|grc|ἀλλ'|ἀλλ’}}.

Ideally the curly apostrophe should also be used in the transliteration – ἀλλ’ ‎(all’). Perhaps the replacement should be done somewhere other than in the function makeLangLink, so that both the link text and the text used to make the transliteration already have the curly apostrophe. — Eru·tuon 00:13, 30 October 2016 (UTC)Reply

The code to create the page name for a given display form is actually in Module:languages, specifically makeEntryName. —CodeCa t 00:16, 30 October 2016 (UTC)Reply

I'm aware of that, and I'm talking about changing link text, not determining the entry name. — Eru·tuon 00:32, 30 October 2016 (UTC)Reply

So the reverse. Can we be absolutely sure that this change is always appropriate? In some languages, the apostrophe or a similar character (for which we might substitute an apostrophe) is a regular part of the orthography. And what about when ' is used as a quotation mark in an entry name? —CodeCa t 00:41, 30 October 2016 (UTC)Reply

I have not encountered quotation marks in entry names; could you give an example?

@CodeCat: It would certainly always be correct in Ancient Greek entries, since there is no other use of an apostrophe in that language. I would assume so in French, but I do not know for sure. 00:46, 30 October 2016 (UTC)Reply

I oppose automatically changing the displayed text for almost any reason, including this one. If we want to do this, it should be by changing the link target. --Wiki Tiki 89 00:45, 30 October 2016 (UTC)Reply

@Wikitiki89: Could you explain why? Do you also oppose displaying plain apostrophes in headwords as curly apostrophes? — Eru·tuon 00:47, 30 October 2016 (UTC)Reply

I don't oppose displaying curly appostrophes, I only oppose the automatic conversion of plain appostrophes to display curly appostrophes (by automatic, I am only referring to Lua modules). --Wiki Tiki 89 01:39, 30 October 2016 (UTC)Reply

@Wikitiki89: Very well, my second question was not worded well. Why do you oppose automatic conversion of plain to curly apostrophes as opposed to manually entering forms in which plain apostrophes are changed to curly into an |alt= parameter? The effect is the same; the only difference is that there is less repetitive work involved. — Eru·tuon 01:53, 30 October 2016 (UTC)Reply

It doesn't have to be an alt parameter. As I said, I'm ok with Lua automatically converting characters for the target link, just not for the display text. So {{l|fr|c’est}} can link to c'est, but {{l|fr|c'est}} should not display "c’est". --Wiki Tiki 89 01:59, 30 October 2016 (UTC)Reply

@Wikitiki89: Well, ideally that should happen too, but only having that would not be particularly useful to me as a Windows user. It's easy to type an apostrophe directly from the keyboard, but you have to use the annoying combination of Alt and 0145 to get a right single quotation mark, or navigate to the correct EditTools menu and select the character. Both are a hassle. It would make things much easier if the module did the work for me. So once again, why do you oppose having a module do it? — Eru·tuon 02:19, 30 October 2016 (UTC)Reply

For a number of reasons. It's bad practice in general. People expect things to display the way they are entered. As for entering them, just copy and paste it. It's not that hard. --Wiki Tiki 89 02:22, 30 October 2016 (UTC)Reply

Well, expectations are overruled by consensus regularly (though I admit that I can think of no functions that automatically modify displayed text from the form in which it was entered; if there were, it would probably be in this module and the headword module). If editors for a particular language have agreed to use a particular apostrophe character, it would be easier to enforce it through modules than to manually replace all apostrophes in all text in that language through all entries, and to have to regularly do cleanup to make sure that newly added text in that language has adhered to the standard. Easier to have a module automatically display c'est as c’est than to have to change ' to ’ wherever it occurs in French text. Of course, perhaps there is no such consensus regarding French. It would be easier to develop consensus for Ancient Greek, which I would think has a smaller group of editors. — Eru·tuon 03:41, 30 October 2016 (UTC)Reply

Phonetic extraction

Latest comment: 7 years ago10 comments3 people in discussion

@CodeCat, Wyang, I know this was mentioned elsewhere, but I'd like to bring it up directly. @Erutuon has moved the transliteration override data to mod:languages's data. If the phonetic_extraction data were similarly moved there, would this solution be amenable to all parties? —John C5 20:27, 14 March 2017 (UTC)Reply

No, because it's completely unnecessary to have it anywhere. All the code does is disable the regular transliteration in favour of a custom module which does the transliteration. If the data in Module:languages were changed so that the translit_module option points to that module, or the current Module:th-translit were modified to call it, then there would be no need for the custom code. —CodeCa t 20:31, 14 March 2017 (UTC)Reply

Ah, so that is still a no then. What ever happened to @Isomorphic's transcription vs. transliteration differentiation? I believe we were looking into that for both this and other languages. —John C5 20:57, 14 March 2017 (UTC)Reply

I don't think that removing the code should depend on the outcome of that. Wyang's code doesn't belong in Module:links or Module:languages no matter what happens, especially since there's an easy and obvious existing way to make it work without: translit_module. If it were just done that way, like all the hundreds of languages besides Thai already do, we wouldn't have this mess. —CodeCa t 21:05, 14 March 2017 (UTC)Reply

So, I believe as far as @Chuck Entz is concerned, this means that neither you nor Wyang will be reinstated as admins until you agree on a solution. I was just trying to see whether we'd made any progress in that respect. —John C5 21:11, 14 March 2017 (UTC)Reply

I presented the only proper solution that works within the existing framework of our modules; one that does not depend on the passing of any votes. Is there a particular reason why it's not taken? Why does Wyang's agreement determine my admin status? —CodeCa t 21:16, 14 March 2017 (UTC)Reply

I'm not arguing either side of this. Just looking for a solution. —John C5 02:22, 15 March 2017 (UTC)Reply

And here I was hoping there'd be one. Waste of time. —CodeCa t 02:23, 15 March 2017 (UTC)Reply

*sigh* I'm just trying to help, Code. I'm not your enemy. —John C5 02:37, 15 March 2017 (UTC)Reply

@JohnC5 I definitely think the phonetic_extraction data would more properly belong to the language modules, as a language-specific property. There are many languages that would benefit from the existence of such a function. Recently there was a discussion at User talk:DerekWinters#Assamese about how best to deal with languages such as Bengali and Assamese whose pronunciations are not very predictable from the spelling. A good solution for these languages would be to assign a respelling to a word, and follow Thai's practice. Wyang (talk) 06:50, 15 March 2017 (UTC)Reply

Unsupported titles

Latest comment: 7 years ago2 comments2 people in discussion

@Erutuon This seems like a perfect candidate to be moved into a data subpage to me. —John C5 15:25, 24 March 2017 (UTC)Reply

@JohnC5: I agree. Done. And now unsupported titles are linkable: :. — Eru·tuon 19:08, 24 March 2017 (UTC)Reply

Script tags for transliterations

Latest comment: 7 years ago11 comments4 people in discussion

@Erutuon, I think your recent edits here did something wonky to the fonts for the transliterations. Could you check it out? — justin(r)leung _{{ (t...) | c=› }} 04:14, 19 May 2017 (UTC)Reply

@Justinrleung: What do you mean? I was aware that adding, for instance, lang="ja" to Japanese transliteration caused transliterations to display with fonts more appropriate for Japanese script, but I changed it to lang="ja-Latn", which may not have the same effect. If it does have that effect, there is a discussion on this topic at Wiktionary:Grease pit/2017/May § CSS classes for transliterations. I do have an idea for how to solve this problem, which I mentioned in that discussion. — Eru·tuon 04:26, 19 May 2017 (UTC)Reply

@Erutuon: Thanks for pointing me to the discussion. I see that you have the same problem I pointed out. Thanks! — justin(r)leung _{{ (t...) | c=› }} 04:38, 19 May 2017 (UTC)Reply

@Justinrleung: Actually, the problem has been solved for me, ever since I changed the language code for transliterations to lang="language code-Latn". Rōmaji in Japanese headwords displays wrong, because it has class="Jpan"; but that has nothing to do with my recent changes. (I can't figure out how to fix it, unfortunately.) Could you point me to an entry in which you see the problem that you are talking about? — Eru·tuon 04:44, 19 May 2017 (UTC)Reply

@Erutuon: ghee is one of them; the Hindi, Urdu and Sanskrit transliterations aren't using the normal font. — justin(r)leung _{{ (t...) | c=› }} 04:48, 19 May 2017 (UTC)Reply

@Justinrleung: That sounds like what happened before. Maybe some browsers (not mine) ignore the -Latn part of the language attribute and just look at the language code part, applying fonts appropriate to the ordinary script of the language, but not appropriate for transliteration. That would be an argument for using class="tr-language code" instead. — Eru·tuon 04:53, 19 May 2017 (UTC)Reply

@Erutuon: It's a problem with Firefox, which I'm using, then. Your solution with class="tr-language code" might work better across browsers. — justin(r)leung _{{ (t...) | c=› }} 05:26, 19 May 2017 (UTC)Reply

@Justinrleung: Fortunately, now that most transliterations are tagged with Module:script utilities, this can be changed very quickly and easily. I'll post on the grease pit thread above, though, before changing anything. — Eru·tuon 05:31, 19 May 2017 (UTC)Reply

Don't some Japanese terms include transliterations in both Latin and Kana scripts? If so, then tagging the entire thing as Latin would be inappropriate. —CodeCa t 13:59, 19 May 2017 (UTC)Reply

Heh. I just checked the Japanese translation from water/translations and you're right: {{t+|ja|水|sc=Jpan|tr=みず, mizu}}. Both transliterations are supplied in the same parameter, so they are tagged the same way (and incorrectly for the Kana one). That's bad. — Eru·tuon 15:38, 19 May 2017 (UTC)Reply

@CodeCat, Erutuon: Yeah, that’s definitely a problem, whatever might be applied to the transliteration as a whole. The kana and Latin spellings need separate tagging, whether it is by script or class attributes. Looking over water/translations, I also see that Japanese (and others of the Japonic group such as Okinawan) seems to be the only case where this is done. We don’t e.g. use Zhuyin for Mandarin Chinese. I suppose the main reason that the kana spelling is supplied is that it is another possible spelling of the word, but as such, perhaps we should not treat it like a transliteration or transcription at all, but have a separate parameter for an equivalent spelling in a different mode of writing, e.g.{{t+|ja|水|sp2=みず|tr=mizu}}. I don’t know whether there would be use cases for other languages, but I suggest that the parameter be intended generally for equivalent respellings within an integrated writing system (in this case Jpan), and not for completely separate writing systems (e.g. Latin/Cyrillic, etc.). Of course, this would require a bot run through our Japanese/Japonic translations, and maybe a change to the Translation Adder, but I think it’s the best way to avoid these code errors and to structure our data appropriately. Also, another solution such as having Lua run through the transliteration looking for different script characters would make {{t}} more bloated. – Krun (talk) 14:07, 24 May 2017 (UTC)Reply

Edit request

Latest comment: 6 years ago2 comments2 people in discussion

@Erutuon Please replace lines 342-343 with the following:

		local class = ""
		
		if data.accel then
			class = "form-of lang-" .. data.lang:getCode() .. " " .. data.accel
		end
		
		-- Only make a link if the term has been given, otherwise just show the alt text without a link
		link = m_scriptutils.tag_text(data.term and export.language_link(data, allowSelfLink, dontLinkRecons) or data.alt, data.lang, data.sc, face, class)

—CodeCa t 12:46, 20 August 2017 (UTC)Reply

Done — Eru·tuon 18:10, 20 August 2017 (UTC)Reply

Adding ts= param

Latest comment: 6 years ago23 comments7 people in discussion

Per this discussion, can someone please add these lines. Thanks.

After line 288:

	elseif itemType == "ts" then
		tag = { '<span class="ts mention-ts Latn">/', '/</span>' }

Replace lines 321-322:

	-- Transliteration and transcription
	if data.tr or data.ts then

Replace line 330:

		if data.tr and data.ts then
			table_insert(annotations, require("Module:script utilities").tag_translit(data.tr, data.lang, kind) .. " " .. export.mark(data.ts, "ts"))
		elseif data.ts then
			table_insert(annotations, export.mark(data.ts, "ts"))
		else
			table_insert(annotations, require("Module:script utilities").tag_translit(data.tr, data.lang, kind))
		end

--Victar (talk) 15:24, 8 March 2018 (UTC)Reply

Done Thanks, @JohnC5 --Victar (talk) 02:43, 9 March 2018 (UTC)Reply

@JohnC5, Victar, Wikitiki89 Could one of you please add information on the correct use of |ts= to Template:link/documentation? I see from the discussion at Wiktionary:Beer parlour/2018/February#Transcription parameter again that most people consider it undesirable to use |ts= for IPA transcriptions, but that's exactly how I've been using it for Burmese, since the pronunciation is not easily deducible from the transliteration. Wikitiki reverted me once because that wasn't the intention behind |ts=, but I'm still not convinced it's such a bad idea. —Mahāgaja (formerly Angr) · talk 12:19, 4 April 2018 (UTC)Reply

@Mahagaja: The problem with putting both IPA and non-IPA in the |ts= parameter is that IPA should be formatted with IPA fonts and non-IPA without, and so one or the other will be formatted incorrectly. Or maybe it's acceptable to put both in an IPA font, if some transcriptions are IPA. (I have no objection to the transcription of Burmese using the IPA otherwise, as long as we specify that you can't, for instance, provide an IPA transcription of English in a linking template.) But this is perhaps not the best place for these remarks.... — Eru·tuon 17:43, 4 April 2018 (UTC)Reply

@Erutuon: And I have no objection to using IPA characters without formatting them as such, since doing so will invariably result in filling up CAT:IPA pronunciations with invalid IPA characters and CAT:IPA pronunciations with invalid representation marks. And I also have no objection to using a non-IPA, pronunciation-faithful transcription like BGN/PCGN (see WT:Burmese transliteration) instead of IPA. —Mahāgaja (formerly Angr) · talk 17:49, 4 April 2018 (UTC)Reply

@Mahagaja: Well, Module:links wouldn't put anything in those categories unless someone made it do that, and it's a fairly costly process in memory and processing time so it would be better not to. — Eru·tuon 18:44, 4 April 2018 (UTC)Reply

My objection was not to the transcription system, but to the idea of having separate transliterations and transcriptions for Burmese in general. Since Burmese is spelled phonetically enough for us to be able to automatically generate IPA, we do not need to have separate transliterations and transcriptions. If our transliterations are so convoluted that they mask the pronunciation, then perhaps we should use a different transliteration system. Or else when an etymology only makes sense given the actual pronunciation, then we can explain it in words instead of stuffing it into another template parameter. Like this. Although, frankly, many of our readers are not gonna make much more sense of that IPA either. --Wiki Tiki 89 18:54, 4 April 2018 (UTC)Reply

There are Burmese words whose pronunciation is unpredictable from the spelling, though. {{my-IPA}} relies on a lot of ad-hoc devices (apostrophes, plus signs, etc.) to get the IPA right; and there are words like ဘတ်စ်ကား (bhatcka:) whose spelling is so weird {{my-IPA}} fails on them and their transcription and pronunciation have to be added manually. —Mahāgaja (formerly Angr) · talk 19:05, 4 April 2018 (UTC)Reply

Even the example that Wikitiki89 mentions above, ခြင်္သေ့ (hkrangse.), seems to be unpredictable: it is pronounced /t͡ɕʰɪ̀ɴðḛ/, but {{my-IPA}} generates /t͡ɕʰɪ̀ɴθḛ/ if it's not supplied a respelling. (I could be wrong because I don't know Burmese.) — Eru·tuon 19:29, 4 April 2018 (UTC)Reply

Oh sorry, my bad. I take it back then. Still, I don't think we should be selectively adding transcriptions to some Burmese links. Either we're gonna (try to) add them everywhere, or only in the pronunciation section of the entry (which is, after all, what we do with English). --Wiki Tiki 89 20:48, 4 April 2018 (UTC)Reply

Rendaku-style voicing is very common in Burmese compounds but not 100% predictable; ခြင်္သေ့ (hkrangse.) could in principle be /t͡ɕʰɪ̀ɴðḛ/ or /t͡ɕʰɪ̀ɴθḛ/, but the former is maybe a little more likely. Reduction of the first syllable is also very common, and is often accompanied by voicing of the consonant before the reduced vowel, so in theory ခြင်္သေ့ (hkrangse.) could also be /t͡ɕʰəðḛ/ or /d͡ʑəðḛ/ (and this vowel reduction sometimes occurs Upper Burma even where it doesn't in the standard language, so for all I know this word really is pronounced /t͡ɕʰəðḛ/ or /d͡ʑəðḛ/ in Mandalay). So I do think it would be helpful for people to see a pronunciation-faithful transcription – even in cases where the pronunciation is well-behaved, just so they know the word in question isn't one of the many exceptions. —Mahāgaja (formerly Angr) · talk 21:17, 4 April 2018 (UTC)Reply

I support the use of |ts= per Mahāgaja and not just for Burmese but other languages as well. Let's take Chinese (Mandarin) word 人民幣／人民币 (rénmínbì) for example. Some people might wonder why in the Russian transcription/transliteration of Mandarin the initial "r-" is rendered ж (ž), not р (r) in Russian, e.g. жэньминьби́ (žɛnʹminʹbí) ([ʐɨnʲmʲɪnʲˈbʲi]). With 人民幣／人民币 (rénmínbì /⁠ʐən³⁵ min³⁵ pi⁵⁠/) it would make a little clearer that Mandarin "r" is actually /ʐ/, for which the Russian /ʐ/ is almost identical and a much better fit than a Russian /r/ would be but in English it's /r/ which sounds closer to Mandarin /ʐ/. --Anatoli T. ^{(обсудить}/^вклад) 23:20, 4 April 2018 (UTC)Reply

Hmm, even I don't think we should use |ts= for that. Pinyin is already pronunciation-faithful, you just have to know what the symbols stand for. —Mahāgaja (formerly Angr) · talk 11:31, 5 April 2018 (UTC)Reply

What I meant is, contrasting various transliterations and romanisations/cyrillisations and some reasoning behind one or the other. E.g. (the automated transliteration here is Revised Romanization (RR): 평양 (Pyeong'yang /⁠pʰjʌ̹ŋja̠ŋ⁠/, “Pyongyang”), McCune–Reischauer: P'yŏngyang, Russian: Пхенья́н (Pxenʹján /⁠pxʲɪˈnʲjan⁠/) vs 부산 (Busan /⁠pusʰa̠n⁠/, “Busan”), McCune–Reischauer: Pusan, Russian: Пуса́н (Pusán /⁠pʊˈsan⁠/). The focus should be here on pʰ/p/ and p/b, this still causes a lot of confusion and arguments about the choices for Korean transliterations, which has pʰ/p/b sounds but the transliteration either matches the pronunciation or the spelling, depending on the position or the method used. Do you see what I'm trying to say? Similarly, in my previous post, I was contrasting the English/Mandarin pinyin "r" with the Cyrillic Russian "ж" with each other and the pronunciation in the source language. --Anatoli T. ^{(обсудить}/^вклад) 13:20, 5 April 2018 (UTC)Reply

Should we then do the same thing for Turkish: English soudjouk from Turkish sucuk (/⁠suˈdʒuk⁠/)? Because many of our readers will not know that c in Turkish is pronounced /dʒ/. I think a better solution to this issue is to use our words, as I already linked above, like this. --Wiki Tiki 89 15:42, 5 April 2018 (UTC)Reply

I agree with Wikitiki that for the Mandarin, Korean, and Turkish examples we should just give the IPA separately with {{IPAchar}}, outside the {{m}} template, and not with |ts=. But for Burmese, where the transliteration doesn't tell you whether certain consonants are voiced or not and whether certain vowels are reduced or not, I'd prefer to use |ts=. And wasn't the whole battle between Rua and Wyang regarding Thai transcription last year ultimately about the fact that one of them wanted a more spelling-based tranliteration and the other wanted a more pronunciation-based transcription? Using |ts= for Thai would allow them both to have what they want, wouldn't it? —Mahāgaja (formerly Angr) · talk 18:39, 5 April 2018 (UTC)Reply

@Wikitiki89: What you did at chinthe is fine by me but to me "ts=" is a shortcut for the same thing. Perhaps the templates could display pronounced as ... or even consider what type of brackets to include - / / or [ ], depending on the type of IPA? @Mahagaja: Even for pronunciation-faithful scripts and transliterations, it makes sense to provide pronunciation in many cases, especially when symbols are very confusing or misleading in terms of the pronunciation. --Anatoli T. ^{(обсудить}/^вклад) 02:20, 7 April 2018 (UTC)Reply

Well what I'm trying to say is precisely that |ts= is not a shortcut for the same thing. It's intended for an entirely different purpose. --Wiki Tiki 89 13:42, 9 April 2018 (UTC)Reply

I really think that if IPA is going to be included, it needs its own parameter: |pron= or, more unambiguously, |ipa=. As you say, IPA shouldn't have hardcoded brackets because it could be either phonetic or phonemic, and it needs to have the IPA class attribute (class="IPA") so that the proper fonts are applied and so that it can be located on a page if anyone wants to mess with it further using CSS or JavaScript. — Eru·tuon 19:29, 7 April 2018 (UTC)Reply

It's not the job of |ts= to convey IPA-level of pronunciation accuracy. In the example of ခြင်္သေ့ (hkrangse.), I can't say I know much of anything about Burmese, but t͡ɕʰəðḛ clearly an IPA rendering, not a transcription, and therefore should not be placed in |ts=. --Victar (talk) 00:30, 7 April 2018 (UTC)Reply

Yes, when we were discussing |ts= originally, it was made very explicit that it should not be used for IPA and should not overlap with the function of the "Pronunciation" section, whose purpose is to give narrow pronunciation information about a word. The transcription parameter was created to include information traditionally associated with/reconstructed for a term but not automatically generatable from the transliteration, all without interfering with the transliteration. The whole point is to give the representations of lemmata that the user is most likely to find in a dictionary (native script, transliteration, (reconstructed) transcription), not narrow pronunciation information. I'd brought up the notion before of limiting the distribution of this parameter to scripts that were deficient enough in information to merit transcription (abjads, syllabaries, cuneiform) so as to avoid this parameter being abused. —*i̯óh₁n̥C ^[5] 00:48, 7 April 2018 (UTC)Reply

One way to think about it is: if wouldn't pass as a headword on wikt, it shouldn't pass for use in |ts=. In the example of ခြင်္သေ့ (hkrangse.) again, hkrangse. gives us enough information to both disambiguate and get a general understanding has to how to pronounce the word. Anything more is overreach. --Victar (talk) 01:15, 7 April 2018 (UTC)Reply

I would like to make a late addition to the conversation to say that I agree with John, Victar, etc. —Μετάknowledge^{discuss/deeds} 17:55, 8 April 2018 (UTC)Reply

mention-pos class

Latest comment: 5 years ago13 comments2 people in discussion

@Erutuon any chance we can get a <span class="mention-pos"></span> around the |pos= so users can apply custom CSS? --{{victar|talk}} 23:10, 27 December 2018 (UTC)Reply

I think it's a good idea to have classes identifying the parts of the output of Module:links. Do you want this in the "mention" template {{m}} and other templates using the same face (confusing term; see Module:script utilities/documentation § tag text) or in the faces used by {{l}} and {{t}} as well? I ask because if it's the latter, then mention-pos is a misleading name because the face used by {{m}} is called "mention"; not sure what would be a good replacement. — Eru·tuon 23:37, 27 December 2018 (UTC)Reply

@Erutuon: I think having it in all templates would be great, so the more generic the class name the better. --{{victar|talk}} 23:53, 27 December 2018 (UTC)Reply

@Victar: Finally coming back to this. Maybe annotation-pos or link-annotation-pos would make sense, because it would allow for other similar classes for the other annotations. — Eru·tuon 21:25, 24 January 2019 (UTC)Reply

@Erutuon, annotation-pos sounds good to me. --{{victar|talk}} 05:24, 25 January 2019 (UTC)Reply

@Victar: Now I'm dithering. POS parameters aren't used very often, but if the prefix is used for other annotation items that are more frequently used, it might increase Lua memory usage and the size of the HTML created by the parser. So maybe I should try to come up with an abbreviated prefix... — Eru·tuon 20:28, 28 January 2019 (UTC)Reply

@Erutuon: "POS parameters aren't used very often"? Really, because I use |pos= just about every day and find them often used? --{{victar|talk}} 20:49, 28 January 2019 (UTC)Reply

@Victar: I know they're used, but not as often as other annotations. Certainly not as often as transliteration. I previewed a page (a) with the proposed change and checked how many there were (1), but probably it was not a good example because it is not the sort of page that would have a lot of POS annotations. Anyway, this might not be a real issue, but I've got to look at it. — Eru·tuon 20:54, 28 January 2019 (UTC)Reply

@Erutuon: I would say it's used as much as |g= and we have <span class="gender"></span> for that. --{{victar|talk}} 21:24, 28 January 2019 (UTC)Reply

@Victar: Right. Maybe I'm being unclear; I'm not arguing for not adding a class, but just trying to determine if it's worth making the class name short or not. — Eru·tuon 21:27, 28 January 2019 (UTC)Reply

@Erutuon: What about .ann-pos. While we're at it, .gender could be renamed to .ann-g. --{{victar|talk}} 21:38, 28 January 2019 (UTC)Reply

@Victar: I fiinally added class="ann-pos" to the part of speech annotation. I decided against changing class="gender" around the gender annotation because some JavaScript and CSS relies on it. Unfortunately the class names for the various annotations have very little rhyme or reason to them.... — Eru·tuon 01:20, 5 February 2019 (UTC)Reply

Well that's triggering my OCD... Thanks for adding that, @Erutuon. It could also be shortened even more to .a-pos, but if you think .ann-pos is short enough, then that's fine with me. --{{victar|talk}} 02:50, 5 February 2019 (UTC)Reply

Glitch with character entity references

Latest comment: 5 years ago4 comments2 people in discussion

This module seems not able to handle character entity references correctly. {{l|en|&}} results in "&", with the semicolon removed in the link, while {{l|en|&}} results in "&" as expected, with the numeric reference replaced with the actual character. It looks like the substitution for numeric references at makeLangLink is unwittingly removing semicolons where it shouldn't. Not sure if this is urgent but just putting it out there. Nardog (talk) 15:56, 16 May 2019 (UTC)Reply

@Nardog: Hm, not sure what's going on there. The substitution that handles decimal character entity references doesn't affect this, and there doesn't seem to be anything else in this module that concerns semicolons. I've added it to the testcases. — Eru·tuon 16:19, 16 May 2019 (UTC)Reply

Aha. The semicolon is being removed by makeEntryName in Module:languages. That's useful in Ancient and Modern Greek at least, where the semicolon is a question mark; it's our convention to omit question and exclamation marks in page titles but to show them in headword lines and probably in links to the entry. Not sure if leaving semicolons would have bad effects in other languages (that is, make links go to the wrong pages). — Eru·tuon 22:35, 16 May 2019 (UTC)Reply

@Erutuon: Thanks. To give you some context, I found the glitch when I was editing Module:ja and Module:ja-pron, which romanize っ representing a glottal stop as an apostrophe. They used ', but that made the rōmaji link at あっ generated by {{ja-pos}} not work (spotted by TAKASUGI Shinji), so I changed it to the decimal reference. Nardog (talk) 11:02, 17 May 2019 (UTC)Reply

Removal of `dontLinkRecons`

Latest comment: 4 years ago1 comment1 person in discussion

The dontLinkRecons parameter to several of the linking functions seems to have had no effect for more than two years, and asterisked terms can be prevented from linking using alt parameters, so I think there's no need for it. I removed it from the module and the documentation. If this isn't correct, please let me know here. See also Module talk:anagrams#dontLinkRecons. — Eru·tuon 21:10, 3 December 2019 (UTC)Reply

Non-breaking spaces not recognized

Latest comment: 3 years ago1 comment1 person in discussion

For example, on Sinn machen

{{m-self|de|Sinn machen}} (with a regular space) is bolded and unlinked, but
{{m-self|de|Sinn machen}} (with a U+00A0 NO-BREAK SPACE) and
{{m-self|de|Sinn machen}} are unbolded links.

These are equivalent for MediaWiki ([[Sinn machen]] → Sinn machen works just fine and links to the right page), so the module should probably recognize it as a self-link as well. (The same goes for {{l-self}}, too, of course.) The incorrect behavior can be observed in the titles of the conjugation tables in the said entry: there shouldn’t be links. —Tacsipacsi (talk) 21:40, 13 November 2020 (UTC)Reply

desc

Latest comment: 2 years ago2 comments1 person in discussion

empty {{desc}}'s causing term requests in the maintanance entries, like on ZomBear's page and sandbox. Vininn126 (talk) 17:09, 9 February 2022 (UTC)Reply

@Erutuon @Benwing2 Vininn126 (talk) 19:20, 10 February 2022 (UTC)Reply

Links to other Wiktionaries in user namespace

Latest comment: 1 year ago2 comments2 people in discussion

@Benwing2: Apparently, the namespace matters when creating links to other Wiktionaries, see [2] and compare it to how it looks when I use the same code here:

I don't think this is intended behavior. — Fytcha〈 T | L | C 〉 22:13, 1 November 2022 (UTC)Reply

The raw wikitext output is the same in both cases, but it renders differently. You can confirm this by going to Special:ExpandTemplates, putting {{l|de|de:Haus}} into the textbox, and setting the page name to either "Module talk:links" or "User:Fytcha/Sandbox". The resulting link code for either title is [[de:Haus#German|de:Haus]].

The reason it renders differently is because in the context of the page User:Fytcha/Sandbox, the link is interpreted as an interwiki link to be displayed in the sidebar, whereas this consideration apparently doesn't apply to talk namespaces like "Module talk". It's still weird that {{l|de|:de:Haus}} results in the same behavior, because normally prefixing an interwiki link with a colon prevents it from being displayed in the sidebar. The raw link code generated is [[:de:Haus#German|de:Haus]]. I guess that prefixing with : doesn't escape interwiki links in the same way that : does. 98.170.164.88 22:35, 1 November 2022 (UTC)Reply

How to add a line break between a word and its transliteration?

Latest comment: 1 year ago1 comment1 person in discussion

At აპირებს#Conjugation and many more pages like this, the conjugation table is chaotic: sometimes there is a line break between a word and its transliteration, sometimes not. I want to make it consistent and make sure there is always a line break. How do I do that with this or any other module/template? Gradilion (talk) 10:33, 7 June 2023 (UTC)Reply

Format Characters in Alternative Display

Latest comment: 1 year ago1 comment1 person in discussion

@Theknightwho: May there be any potential problem beyond transliteration with loading the alternative display supplied to full_link() with stateless Unicode formatting characters, such as soft hyphens (U+00AD SHY) and word joiners (U+2060 WJ)? RichardW57m (talk) 10:34, 12 June 2023 (UTC)Reply

allow_self_link is ignored???

Latest comment: 4 months ago1 comment1 person in discussion

@Theknightwho Since Jan, allow_self_link is being ignored, contrary to the docs. Can you explain the logic here? I tried to enable self links in Module:place line 195, but the term 'Mexico' still shows up as a bold unclickable link regardless. Benwing2 (talk) 04:34, 9 March 2024 (UTC)Reply

Add topic

Module talk:links

Alt text

Annotated link

"main" and arguments

g

General linking module?

altForm from WT:EDIT

Genders

Script template class inconsistencies

template_l_xform()

Testcases failing

Holding and accessing language data

language_link() and script detection

Template:recons with empty first parameter

diacritic removal for Cyrillic broken

Merging template_l and template_term

Lua-izing {{term}}

Links to "und"

The format of the annotations

Diacritic removal for Lithuanian and Latvian

Mycenaean Greek italicized

"*" before automated transliterations of reconstructed terms

broken broken broken

Link suffixes are not linked

Newlines should be stripped

interwiki links should not be redirected

Punctuation

|face=

Link alt form tracking/redundant, Link alt form tracking/redundant/ru

Sindhi diacritics

Korean transliteration

Linking to reconstructed terms when lang is und

Not an actual problem, but curious behavior

Disabling auto-translit

Automatically replacing plain apostrophes with curly apostrophes in link text

Phonetic extraction

Unsupported titles

Script tags for transliterations

Edit request

Adding ts= param

mention-pos class

Glitch with character entity references

Removal of dontLinkRecons

Non-breaking spaces not recognized

desc

Links to other Wiktionaries in user namespace

How to add a line break between a word and its transliteration?

Format Characters in Alternative Display

allow_self_link is ignored???

Lua-izing `{{term}}`

Removal of `dontLinkRecons`