1 Feb 04:31
UTF8 BOM
Darren Cook <darren <at> dcook.org>
2007-02-01 03:31:36 GMT
2007-02-01 03:31:36 GMT
I'm editing UTF-8 files on linux and got bitten by the unexpected BOM character being inserted at the front of the file. As others [1][2] have said, "UTF-8 with BOM" and "UTF-8 without BOM" would be less confusing. (As SciTE doesn't actually write the cookie there seems no need to mention it.) Now I've stripped the BOM character out, when I reopen the file it opens as 8-bit, not UTF-8 (though it seems to be understanding it as UTF-8, and doesn't corrupt if I then re-save). So it seems 8-bit means UTF-8 as well, at least with my settings [3]. I think this is what someone meant in the Dec 2006 thread when they suggested "8-bit" should be called "default"? Darren [1]: http://www.lyra.org/pipermail/scite-interest/2006-December/008325.html [2]: http://www.mail-archive.com/scite-interest <at> lyra.org/msg02649.html [3]: # Internationalisation # Japanese input code page 932 and ShiftJIS character set 128 #code.page=932 #character.set=128 # Unicode code.page=65001 #code.page=0(Continue reading)
RSS Feed