RE: about vcard encoding
Misha Wolf <Misha.Wolf <at> reuters.com>
2006-05-26 17:10:05 GMT
Not quite correct. UTF-8 represents the ASCII character
repertoire using the same bit combinations as does ASCII.
For example, "A" is 41 hex.
This does *not* apply to the right-hand half of ISO 8859-1 /
Latin-1. These characters are encoded using two bytes in
UTF-8. For example, the copyright sign is C2 A9 hex.
All of this is beatifully shown in:
From: owner-imc-vcard <at> mail.imc.org [mailto:owner-imc-vcard <at> mail.imc.org]
On Behalf Of S. Isaac Dealey
Sent: 26 May 2006 16:46
To: imc-vcard <at> imc.org
Subject: Re: about vcard encoding
> About my question,I want to explain it again as follows:
> Because a vcard can describe as a line consisting of
> three parts:property name(for example ,"N","TEL","ADR")
> +property parameters(for example ,"ENCODING","CHARSET")
> +property values.My question is about the property name
> and property parameters.Because the property name and
> property parameters is used as a notation,so I think in
> all vcard,the two parts should be encoded in same
> character set and encoding so that different platform
> can communicate seamlessly.But in the formal definition
> of vcard2.1 and vcard3.0 which is written using the
> ABNF,I only know that the two parts are defined as
> terminal values(for example,name = "LOGO"/"PHOTO").
> I have checked the ABNF specification (RFC 2234).
> It say that the string using the us-ascii character set
> and the external encoding is not specified.
> Then my clear question is that if the two parts in a
> vcard line can be encoded in different format(for
> example,ASCII,UNICODE) or only encoded using ASCII and
> where can i find the document saying about it.
I'm not sure about how data is stored in memory, although my
understanding is that a file system will only allow a single character
set to be associated with any given file. That being the case, if the
file is stored as a UTF-8 file, then whatever character set might be
declared for an individual property would have to be stored in the
file in UTF-8 as a declaration of what character set that value should
be converted to after the file is read.
Conversely that would mean that if the file is stored using the
latin-1 or ISO-8859-1 character set, then any property values that
aren't within the ASCII character set would need to be encoded using
some format which can be represented entirely in that character set,
such as Quoted-Printable or Base64.
Probably this is the reason why the vCard 3.0 standard doesn't allow
the charset parameter for individual properties, because the file can
only be stored with a single character set, and presumably it would be
best (easiest?) to simply choose one character set that will support
most of the card's content than it would be to design an
implementation that can manage character sets for individual
Since the standard for vCard says the property names ("logo"/"photo")
must be stored in ASCII you might think that this would prevent us
from storing vCards with UTF-8 character set. I think UTF-8 gives us
something of a reprieve, however, because the UTF-8 character set
functions differently than some other character sets, containing first
the complete set of latin-1 characters and representing them as
single-byte characters instead of double-byte characters. As I
understand it, this means that unlike other character sets, any ASCII
characters represented in UTF-8 _are_ single-byte ASCII characters.
The end result of this would mean that you can store a vCard with the
UTF-8 character set and the file will still conform to the requirement
for the property names to be stored in ASCII.
My understanding of this may not be complete or completely accurate,
since it's not my area of expertise, but that's how I've come to
s. isaac dealey 434.293.6201
new epoch : isn't it time for a change?
add features without fixtures with
the onTap open source framework
To find out more about Reuters visit www.about.reuters.com
Any views expressed in this message are those of the individual sender, except where the sender
specifically states them to be the views of Reuters Ltd.