Transliteration questions

Fragment of a discussion from User talk:Rua
Edited by author.
Last edit: 16:28, 8 July 2015

I can't help you at all on the first part, sorry.

For the Italic alphabets, the common set was chosen so that it could apply for all languages. If it doesn't apply to all languages equally, then it shouldn't be in the common set. Alternatively, you could transliterate the language-specific features first, and let the common set handle whatever remains after that.

Something you need to be careful with is using gsub with '.' to replace multiple-character combinations. That's not going to work. Sadly, extending it to '..' will not work either in case you were thinking of that. The way I handle these situations is a bit more elaborate but it works much better at least.

  • "rest" contains characters yet to be processed, "parts" is a table containing characters or sequences that were recognised.
  • Look at the "rest" string for the longest match with each one of the character search sequences.
  • Once the longest match is determined, insert that into the list of parts. If no match was found at all, just insert the first character.
  • Remove the processed characters from "rest".
  • Repeat until "rest" is empty.
CodeCat16:20, 8 July 2015

I currently have it transliterating the language-specific features first then the common set second.

Any idea about getting ⁚ & : to both transliterate to f?

And do you think I need to have a vote or something about these correspondences, or should I just enact them de facto?

JohnC516:26, 8 July 2015

Would you not just add them to the table?

CodeCat16:29, 8 July 2015