5 May 23:34
homographs in TrueType fonts
Erik van der Poel <erik <at> vanderpoel.org>
2005-05-05 21:34:41 GMT
2005-05-05 21:34:41 GMT
I have written a small program that parses a number of TrueType font tables to determine which pairs of Unicode codepoints end up using the same glyphs. The ASCII part of the table is included below. Each line has a codepoint, its glyph, the other codepoint of the pair, and the number of fonts in which that pair is identical. U+2044 and U+2215 use the same glyph as the slash (U+002F) in a few East Asian fonts. Note also that the capital letters I and O have homographs, although some apps present domain names in lower case, so those homographs would stand out in those apps. For the complete table, see: http://nameprep.org/tt-hg.html Erik 0021(!);01C3;2 0022(");02BA;4 0022(");05F4;12 0027(');0060;1 0027(');02B9;4 0027(');05F3;12 0027(');2032;6 0028(();FD3E;3 0029());FD3F;3 002C(,);201A;9 002D(-);2010;12 002D(-);2012;1 002D(-);2013;2 002F(/);2044;3 002F(/);2215;4(Continue reading)
. To the countrary I find extremely interesting that some
people were able to rename charsets "scripts" in order to insert charsets
into languages descriptions while claiming they dont (cf. above). Obviously
they are unhappy when I expose the trick. Anyway the result is great fun:
people will be prevented from accessing a page they know to read, if they
do not know the language.
This cacologic however might be a good way to solve the IDN homograph issue
and the phishing problem.
If we revert from those famous "scripts" to what they are, i.e. unicode
partitions, hence stable and well documented charsets
(
RSS Feed