Hindi searches

Persian often uses a zero-width nonjoiner (& # x200C;) as in ویکی‌پدیا. People who don’t know how to access it tend to substitute a space: ویکی پدیا. It’s a misspelling, but lots of people can’t help it.

In languages like Khmer and Thai that do not use word spaces, there is often a zero-width space (& # x200B;) as in តើអ្នកនិយាយ​ភាសាអង់គ្លេស​ទេ. More often than not, it is simply left out (តើអ្នកនិយាយភាសាអង់គ្លេសទេ). Both spellings are correct.

I think Anatoli neglected to mention the word-final Arabic pair ه/ة. The final letter ة may be typed as ه.

—Stephen (Talk)23:05, 31 January 2011

Do you know how Mediawiki currently handles this? We obviously don't want all spaces to be normalized to that character, and we don't want that character to be normalized to a space either (or terrible matches would be made).

TheDaveRoss23:13, 31 January 2011

No, I don’t know how it works. &# x200C; and &# x200B; definitely should not be removed or changed, but &# x200C; should be seen by the software as the equivalent of a space, and &# x200B; should be seen as the equivalent of nothing at all.

—Stephen (Talk)00:07, 1 February 2011