What is the default Character Set if character.set=0?
trismarck <trismarck <at> gmail.com>
2013-05-14 13:04:15 GMT
Hello,
I have a question about the default Character Set that Scite uses when it opens a text file w/o the Unicode BOM (or w/o any other indication, what the character encoding of the text file is).
There is a setting in File: Encoding, that is called "Code Page Property". I've found out that this option maps to the code.page setting in the SciTEGlobal.Properties file (Options: Open Global Properties). What I have set there is: code.page=0, which means that, to display the contents of the text file, a _single_-byte code page will be used.
On this newsgroup I've found the information that the code.page setting is actually only needed to determine, _how many bytes_ does each character in the text file translates to (I know there exist 'code pages' for which not every character maps to the same number of bytes), and not to set a particular character set. So, if code.page=0, then that means that each character in the text file is interpreted as a single byte.
The second setting is the character.set setting. This setting determines, what character set will be used to map the bytes (or byte sequences) in the file to the actual characters (characters that this character.set contains).
Now, the documentation tells me that the character.set=0 means the 'Default' character set. This is the part where I get stuck - how does Scite determine, what is this 'default' character set? Does Scite use some Windows API call to i.e. determine the default code page of the OS? I've actually sifted through the source code and I've found out that, if Scite uses GTK+ or Qt, then the SC_CHARSET_DEFAULT preprocessor directive (see Scintilla.h) maps to 0, which maps to ISO-8859-1 (see CharacterSetID() function in PlatGTK.cxx and PlatQt.cxx). But, I'm not sure if the same happens on Windows - I couldn't find the function that would tell me, how the default character set is determined if Scite uses Windows controls. In ScintillaWin.cxx I've found the CodePageFromCharset() function that takes the VisualStyle object that has the Style subobject and this Style subobject has the .characterset property defined, but I can't figure out, how is this property set (or this property is actually relevant to my original question) if Scite uses Windows controls.
So the questions I have:
1. What is the default Character Set if character.set=0? (if Scite uses Windows controls)
2. What is the function is the Scite source code that determines, what is the default Character Set? (is this just a fixed setting or does Scite use Windows API calls? (and if it uses those calls, where can I find them in the code) )
3. How is the .characterset property of the Style subobject of the VisualStyle object set by default (and is this at all relevant to what I'm asking about).
What I want to do is to roughly understand, how does the text editor use a given character encoding to read a text file / display the characters that it has read from the text file. Especially, to test this, I need to know exactly, what character set Scite used to encode characters to the file (characters -> byte sequences).
Cheers and sorry for the long post.
--
You received this message because you are subscribed to the Google Groups "scite-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scite-interest+unsubscribe <at> googlegroups.com.
To post to this group, send email to scite-interest <at> googlegroups.com.
Visit this group at
http://groups.google.com/group/scite-interest?hl=en.
For more options, visit
https://groups.google.com/groups/opt_out.