1 Aug 2003 10:24
Re: encoding declarations
Burnard Towers <lou.burnard <at> COMPUTING-SERVICES.OXFORD.AC.UK>
2003-08-01 08:24:16 GMT
2003-08-01 08:24:16 GMT
Michael Beddow remarks: > That means that any entities intended as generic > boilerplate constituents for a wide variety of documents should > always have > a (correct!) text declaration. That way, they will be usable no > matter what > the encoding of the document into which they are included. Similarly, > precisely in the context of a corpus where more than one encoding may have > been employed, it is wise to take precautions against > hard-to-trace encoding > muddles by furnishing each entity with an appropriate text declaration. > A further aspect of this which has always perplexed me is that (unless I'm mistaken) the encoding of an entity which embeds another one does *not* become the default for the embedded entity. In other words, if I have a corpus of non-UTF-8 encoded entities, it is not enough simply to stick an appropriate encoding declaration on the outermost entity which embeds all the others: if they don't have their own declarations, they will default to UTF8 and things will go Horribly Wrong. L
RSS Feed