Script recognition module
You're right. A letter like "C" is probably both Latn and Latinx. The same problem probably would happen with pa-Arab, ota-Arab, etc. if we had similar categories for the Arabic script.
Maybe it's not feasible, but can findBestScript
iterate over all scripts, but give priority for 4-letter scripts? If it finds something in Latn or Arab, it stops the search and does not iterate over Latinx and fa-Arab.
Or maybe just give priority to Latn over Latinx and forget Arab and the others unless they become a problem at some point.
We could also change the data format of the scripts a bit, giving them a "hierarchy" of some sort.
Suggestion: in Latinx, nv-Latn, pjt-Latn... add parent = "Latin",
.
In Latn, Grek, Cyrl... add parent = "top",
.
And in findBestScript
, give priority to scripts that have "parent = top".
I added the parent in all scripts of Module:scripts/data. Feel free to check if I did it right. I'm not sure what to do with cases like Jpan, Hira, Kana, Hani, Hans, where scripts overlap, so when in doubt I used parent = "top",
in all cases.
I also created a function :getParent()
. I tested it; it's working.
I don't know yet if I would be able to make findBestScript
give priority to scripts that have parent = "top",
. If you'd like to do it, please be my guest. Otherwise, I think I should try later.