8 Apr 03:55
Re: space-like unicode char
Erik van der Poel <erik <at> vanderpoel.org>
2005-04-08 01:55:20 GMT
2005-04-08 01:55:20 GMT
Soobok Lee wrote: > U+1160 is a space-like char and even stringprep/nameprep does not > filter it out because the char is not for punctuational purpose. U+1160 is HANGUL JUNGSEONG FILLER and it is used to transform nonstandard syllables into standard ones (Unicode 3.0 section 3.11 (RFC 3454 refers to Unicode 3.2.0)). However, this transformation is one of the additional transformations not considered part of Unicode normalization (3.2.0's UAX #15 Annex 10). So this character is not generated by Stringprep/Nameprep. However, it is not prohibited either, so it may occur in the input to (and output from) Stringprep/Nameprep. I read some of the sections on Hangul in the Unicode book and Web site, but I did not see any rules regarding repeated occurrences of U+1160 (as you had in your example, not quoted above). I also did not see any rules about what to do when a filler is not followed by a Hangul jamo. It would be nice to have these rules in Unicode or in Stringprep. I tried U+1160 followed by a Latin character in MSIE with i-Nav and in Firefox with IDN turned on, and it was displayed as a wide space. It is unfortunate that both implementations chose to display it as a space instead of deleting it. Erik
We can find similar problems in "combining diacritical marks" (U+3xx).
What if
a label with single char 'combining accent or above-dot ' without any
preceding
alphabet? It will combine with its preceding dot delimiter. and that
will produce
confusing looks ( looks like a colon which is a protocol delimiter).
RSS Feed