Script recognition module

Fragment of a discussion from User talk:Rua

That would be Module:scripts. Specifically the findBestScript function.

CodeCat21:06, 27 July 2016

Thank you.

--Daniel Carrero (talk)21:07, 27 July 2016

I would like if {{auto cat}} (or {{charactercat}} or whatever template), when used in Category:Bb, automatically recognized that "Bb" is in Latin script. For example, it could be categorized into "Category:Latin script something", it could have "Latin script" in the description and the "Bb" in the description would have the right script label in the code.

Likewise, Category:Δδ can be created for Greek script.

And Category:Bb: ⠃ (Latin–Braille) already exists. The category name has a mixture of scripts, but the module is already prepared to recognize the different contents before and after the colon.

But findBestScript requires a language code and the categories mentioned are multi-language categories. Can't we change the module so that it iterates over all scripts, when the language is und or something?

--Daniel Carrero (talk)00:13, 28 July 2016

That can work, but what about cases like Latn vs Latinx? A language would never have both as its script, but if it blindly goes over all the scripts, it's different.

CodeCat00:34, 28 July 2016

You're right. A letter like "C" is probably both Latn and Latinx. The same problem probably would happen with pa-Arab, ota-Arab, etc. if we had similar categories for the Arabic script.

Maybe it's not feasible, but can findBestScript iterate over all scripts, but give priority for 4-letter scripts? If it finds something in Latn or Arab, it stops the search and does not iterate over Latinx and fa-Arab.

Or maybe just give priority to Latn over Latinx and forget Arab and the others unless they become a problem at some point.

--Daniel Carrero (talk)00:46, 28 July 2016

We could also change the data format of the scripts a bit, giving them a "hierarchy" of some sort.

CodeCat00:58, 28 July 2016

Suggestion: in Latinx, nv-Latn, pjt-Latn... add parent = "Latin",.

In Latn, Grek, Cyrl... add parent = "top",.

And in findBestScript, give priority to scripts that have "parent = top".

--Daniel Carrero (talk)01:20, 28 July 2016

Yeah, something like that.

CodeCat01:35, 28 July 2016

I added the parent in all scripts of Module:scripts/data. Feel free to check if I did it right. I'm not sure what to do with cases like Jpan, Hira, Kana, Hani, Hans, where scripts overlap, so when in doubt I used parent = "top", in all cases.

I also created a function :getParent(). I tested it; it's working.

I don't know yet if I would be able to make findBestScript give priority to scripts that have parent = "top",. If you'd like to do it, please be my guest. Otherwise, I think I should try later.

--Daniel Carrero (talk)08:37, 29 July 2016

I think there should just be no parent when there isn't any, rather than "top".

CodeCat15:44, 29 July 2016

Woopsie! Could we add ancestor information too? Sorry I'm such a goofus!

JohnC503:22, 1 August 2016
 

Ok. I removed all instances of parent="top".

--Daniel Carrero (talk)21:52, 9 August 2016