Jianbiao Guo | 25 May 12:01 2006
Picon

about vcard encoding

Hi,all
  I am writing a lib about parsing a vcard into one class.and I want to know if the property name and property parameters must be encoded in US-ASCII .If the format is not standard,different platform will can not communicate with each other.
 
Thanks a lot
 
Best regards
 
John Guo
S. Isaac Dealey | 25 May 19:32 2006
Picon

Re: about vcard encoding


> Hi,all
>   I am writing a lib about parsing a vcard into one
>   class.and I want to know
> if the property name and property parameters must be
> encoded in US-ASCII .If
> the format is not standard,different platform will can not
> communicate with
> each other.

Clarification of this would be helpful for me also. I've been reading
the RFC's and my understanding is that the file must be 7-bit ASCII
which means anything that's not ASCII needs to be represented with an
ASCII encoding. The property names shouldn't be an issue, because
they're all part of the standard and therefore all ASCII, but property
values may contain other types of characters.

Certainly it would make sense for someone who speaks only non-latin
languages to be able to include non-latin notes in their vCard.

The specification for vCard 2.1 included two forms of encoding which
are described specifically for solving the problem of binary
(graphics, sounds) and multi-line content (a postal address with line
breaks for exmaple) and I assume both of these methods also address
the problem of non-latin characters, being the existing standards of
Quoted-Printable and Base64 encoding (which is specifically a text
representation of binary data).

The specification for vCard version 3.0 no longer supports the
Quoted-Printable encoding, requiring all content to be Base64 encoded
(although RFC2426 unfortunately never mentions Base64 encoding,
instead referring only to "B" as the designation for brinary encoding,
without specifying that it must be Base64 encoded and neither the
vCard 2.1 specification or the RFCs for 3.0 mention the RFCs for the
encoding - 1421 & 2045 - found those on Wikipedia).

So the question I have (and this may be a "clueless newbie" question)
is how would an application receiving a vCard for import know that
Base64 encoded data is intended to be UTF8 containing non-latin
characters?

s. isaac dealey     434.293.6201
new epoch : isn't it time for a change?

add features without fixtures with
the onTap open source framework

http://www.fusiontap.com
http://coldfusion.sys-con.com/author/4806Dealey.htm

Jianbiao Guo | 26 May 03:34 2006
Picon

Re: about vcard encoding

Hi,dealey    
  Thanks for your replying.In vcard2.1,we can specify the property parameter"CHARSET" equal to "UTF-8",then the vcard reader can know the property value is using UTF-8 .and then the UTF-8 value can be converted into Unicode.You will not lose anything because the UTF-8 is the encoding format of Unicode.In Unicode,almost all the characters using by human are contained.if your platform support Unicode,you can display it directly to the user.In vcard3.0,this property parameter is moved to the mail header.so if you want to use the property parameter in your vcard,I think you'd better compliant to vcard2.1 standard not vcard3.0.
  About my question,I want to explain it again as follows:
  Because a vcard can describe as a line consisting of three parts:property name(for example ,"N","TEL","ADR")+property parameters(for example ,"ENCODING","CHARSET")+property values.My question is about the property name and property parameters.Because the property name and property parameters is used as a notation,so I think in all vcard,the two parts should be encoded in same character set and encoding so that different platform can communicate seamlessly.But in the formal definition of vcard2.1 and vcard3.0 which is written using the ABNF,I only know that the two parts are defined as terminal values(for example,name = "LOGO"/"PHOTO").I have checked the ABNF specification (RFC 2234).It say that the string using the us-ascii character set  and the external encoding is not specified.Then my clear question is that if the two parts in a vcard line can be encoded in different format(for example,ASCII,UNICODE) or only encoded using ASCII and where can i find the document saying about it.
 
  Thanks 
Best Regards
 
John Guo 

 
2006/5/26, S. Isaac Dealey <info <at> turnkey.to>:

> Hi,all
>   I am writing a lib about parsing a vcard into one
>   class.and I want to know
> if the property name and property parameters must be
> encoded in US-ASCII .If
> the format is not standard,different platform will can not
> communicate with
> each other.

Clarification of this would be helpful for me also. I've been reading
the RFC's and my understanding is that the file must be 7-bit ASCII
which means anything that's not ASCII needs to be represented with an
ASCII encoding. The property names shouldn't be an issue, because
they're all part of the standard and therefore all ASCII, but property
values may contain other types of characters.

Certainly it would make sense for someone who speaks only non-latin
languages to be able to include non-latin notes in their vCard.

The specification for vCard 2.1 included two forms of encoding which
are described specifically for solving the problem of binary
(graphics, sounds) and multi-line content (a postal address with line
breaks for exmaple) and I assume both of these methods also address
the problem of non-latin characters, being the existing standards of
Quoted-Printable and Base64 encoding (which is specifically a text
representation of binary data).

The specification for vCard version 3.0 no longer supports the
Quoted-Printable encoding, requiring all content to be Base64 encoded
(although RFC2426 unfortunately never mentions Base64 encoding,
instead referring only to "B" as the designation for brinary encoding,
without specifying that it must be Base64 encoded and neither the
vCard 2.1 specification or the RFCs for 3.0 mention the RFCs for the
encoding - 1421 & 2045 - found those on Wikipedia).

So the question I have (and this may be a "clueless newbie" question)
is how would an application receiving a vCard for import know that
Base64 encoded data is intended to be UTF8 containing non-latin
characters?


s. isaac dealey     434.293.6201
new epoch : isn't it time for a change?

add features without fixtures with
the onTap open source framework

http://www.fusiontap.com
http://coldfusion.sys-con.com/author/4806Dealey.htm


Zdravko Stoychev | 26 May 08:15 2006
Picon

Re: about vcard encoding

Hey, Jianbiao!
Jianbiao Guo написа:
Hi,dealey    
  Thanks for your replying.In vcard2.1,we can specify the property parameter"CHARSET" equal to "UTF-8",then the vcard reader can know the property value is using UTF-8 .and then the UTF-8 value can be converted into Unicode.You will not lose anything because the UTF-8 is the encoding format of Unicode.In Unicode,almost all the characters using by human are contained.if your platform support Unicode,you can display it directly to the user.In vcard3.0,this property parameter is moved to the mail header.so if you want to use the property parameter in your vcard,I think you'd better compliant to vcard2.1 standard not vcard3.0.
  About my question,I want to explain it again as follows:
  Because a vcard can describe as a line consisting of three parts:property name(for example ,"N","TEL","ADR")+property parameters(for example ,"ENCODING","CHARSET")+property values.My question is about the property name and property parameters.Because the property name and property parameters is used as a notation,so I think in all vcard,the two parts should be encoded in same character set and encoding so that different platform can communicate seamlessly.But in the formal definition of vcard2.1 and vcard3.0 which is written using the ABNF,I only know that the two parts are defined as terminal values(for example,name = "LOGO"/"PHOTO").I have checked the ABNF specification (RFC 2234).It say that the string using the us-ascii character set  and the external encoding is not specified.Then my clear question is that if the two parts in a vcard line can be encoded in different format(for example,ASCII,UNICODE)
vCard would always be in ASCII format, no matter some parts are in Unicode, UTF7, etc. By using Quoted-Printable, Base64, UTF8 the final vCard would be 'ACSII encoded', so you dont have to worry about it. The reason of using all those codecs is to transform the vCard to ASCII symbols only.
or only encoded using ASCII and where can i find the document saying about it.
 
  Thanks 
Best Regards
 
John Guo 

 
2006/5/26, S. Isaac Dealey <info <at> turnkey.to>:

> Hi,all
>   I am writing a lib about parsing a vcard into one
>   class.and I want to know
> if the property name and property parameters must be
> encoded in US-ASCII .If
> the format is not standard,different platform will can not
> communicate with
> each other.

Clarification of this would be helpful for me also. I've been reading
the RFC's and my understanding is that the file must be 7-bit ASCII
which means anything that's not ASCII needs to be represented with an
ASCII encoding. The property names shouldn't be an issue, because
they're all part of the standard and therefore all ASCII, but property
values may contain other types of characters.

Certainly it would make sense for someone who speaks only non-latin
languages to be able to include non-latin notes in their vCard.

The specification for vCard 2.1 included two forms of encoding which
are described specifically for solving the problem of binary
(graphics, sounds) and multi-line content (a postal address with line
breaks for exmaple) and I assume both of these methods also address
the problem of non-latin characters, being the existing standards of
Quoted-Printable and Base64 encoding (which is specifically a text
representation of binary data).

The specification for vCard version 3.0 no longer supports the
Quoted-Printable encoding, requiring all content to be Base64 encoded
(although RFC2426 unfortunately never mentions Base64 encoding,
instead referring only to "B" as the designation for brinary encoding,
without specifying that it must be Base64 encoded and neither the
vCard 2.1 specification or the RFCs for 3.0 mention the RFCs for the
encoding - 1421 & 2045 - found those on Wikipedia).

So the question I have (and this may be a "clueless newbie" question)
is how would an application receiving a vCard for import know that
Base64 encoded data is intended to be UTF8 containing non-latin
characters?


s. isaac dealey     434.293.6201
new epoch : isn't it time for a change?

add features without fixtures with
the onTap open source framework

http://www.fusiontap.com
http://coldfusion.sys-con.com/author/4806Dealey.htm




-- Zdravko Stoychev System Software and Support MPS Ltd. zdravko.stoychev <at> mps.bg +359-2-971-2324 (ext.271) Ако не отговарям на писмата Ви - погледнете тук: http://6lyokavitza.org/mail This e-mail is intended only for the addressee(s) and may contain privileged and confidential information. It should not be disseminated, distributed, or copied. If you have received this e-mail message by mistake, please inform the sender, and delete it from your system.
Attachment (smime.p7s): application/x-pkcs7-signature, 3787 bytes
S. Isaac Dealey | 26 May 17:46 2006
Picon

Re: about vcard encoding


> Hi,dealey
>   About my question,I want to explain it again as follows:
> Because a vcard can describe as a line consisting of
> three parts:property name(for example ,"N","TEL","ADR")
> +property parameters(for example ,"ENCODING","CHARSET")
> +property values.My question is about the property name
> and property parameters.Because the property name and
> property parameters is used as a notation,so I think in
> all vcard,the two parts should be encoded in same
> character set and encoding so that different platform
> can communicate seamlessly.But in the formal definition
> of vcard2.1 and vcard3.0 which is written using the
> ABNF,I only know that the two parts are defined as
> terminal values(for example,name = "LOGO"/"PHOTO").
> I have checked the ABNF specification (RFC 2234).
> It say that the string using the us-ascii character set
> and the external encoding is not specified.

> Then my clear question is that if the two parts in a
> vcard line can be encoded in different format(for
> example,ASCII,UNICODE) or only encoded using ASCII and
> where can i find the document saying about it.

I'm not sure about how data is stored in memory, although my
understanding is that a file system will only allow a single character
set to be associated with any given file. That being the case, if the
file is stored as a UTF-8 file, then whatever character set might be
declared for an individual property would have to be stored in the
file in UTF-8 as a declaration of what character set that value should
be converted to after the file is read.

Conversely that would mean that if the file is stored using the
latin-1 or ISO-8859-1 character set, then any property values that
aren't within the ASCII character set would need to be encoded using
some format which can be represented entirely in that character set,
such as Quoted-Printable or Base64.

Probably this is the reason why the vCard 3.0 standard doesn't allow
the charset parameter for individual properties, because the file can
only be stored with a single character set, and presumably it would be
best (easiest?) to simply choose one character set that will support
most of the card's content than it would be to design an
implementation that can manage character sets for individual
properties.

Since the standard for vCard says the property names ("logo"/"photo")
must be stored in ASCII you might think that this would prevent us
from storing vCards with UTF-8 character set. I think UTF-8 gives us
something of a reprieve, however, because the UTF-8 character set
functions differently than some other character sets, containing first
the complete set of latin-1 characters and representing them as
single-byte characters instead of double-byte characters. As I
understand it, this means that unlike other character sets, any ASCII
characters represented in UTF-8 _are_ single-byte ASCII characters.
The end result of this would mean that you can store a vCard with the
UTF-8 character set and the file will still conform to the requirement
for the property names to be stored in ASCII.

My understanding of this may not be complete or completely accurate,
since it's not my area of expertise, but that's how I've come to
understand it.

s. isaac dealey     434.293.6201
new epoch : isn't it time for a change?

add features without fixtures with
the onTap open source framework

http://www.fusiontap.com
http://coldfusion.sys-con.com/author/4806Dealey.htm

Misha Wolf | 26 May 19:10 2006
Picon

RE: about vcard encoding


Not quite correct.  UTF-8 represents the ASCII character 
repertoire using the same bit combinations as does ASCII.
For example, "A" is 41 hex.

This does *not* apply to the right-hand half of ISO 8859-1 /
Latin-1.  These characters are encoded using two bytes in 
UTF-8.  For example, the copyright sign is C2 A9 hex.

All of this is beatifully shown in:
   http://www.macchiato.com/unicode/chart/

Misha

-----Original Message-----
From: owner-imc-vcard <at> mail.imc.org [mailto:owner-imc-vcard <at> mail.imc.org]
On Behalf Of S. Isaac Dealey
Sent: 26 May 2006 16:46
To: imc-vcard <at> imc.org
Subject: Re: about vcard encoding

> Hi,dealey
>   About my question,I want to explain it again as follows:
> Because a vcard can describe as a line consisting of
> three parts:property name(for example ,"N","TEL","ADR")
> +property parameters(for example ,"ENCODING","CHARSET")
> +property values.My question is about the property name
> and property parameters.Because the property name and
> property parameters is used as a notation,so I think in
> all vcard,the two parts should be encoded in same
> character set and encoding so that different platform
> can communicate seamlessly.But in the formal definition
> of vcard2.1 and vcard3.0 which is written using the
> ABNF,I only know that the two parts are defined as
> terminal values(for example,name = "LOGO"/"PHOTO").
> I have checked the ABNF specification (RFC 2234).
> It say that the string using the us-ascii character set
> and the external encoding is not specified.

> Then my clear question is that if the two parts in a
> vcard line can be encoded in different format(for
> example,ASCII,UNICODE) or only encoded using ASCII and
> where can i find the document saying about it.

I'm not sure about how data is stored in memory, although my
understanding is that a file system will only allow a single character
set to be associated with any given file. That being the case, if the
file is stored as a UTF-8 file, then whatever character set might be
declared for an individual property would have to be stored in the
file in UTF-8 as a declaration of what character set that value should
be converted to after the file is read.

Conversely that would mean that if the file is stored using the
latin-1 or ISO-8859-1 character set, then any property values that
aren't within the ASCII character set would need to be encoded using
some format which can be represented entirely in that character set,
such as Quoted-Printable or Base64.

Probably this is the reason why the vCard 3.0 standard doesn't allow
the charset parameter for individual properties, because the file can
only be stored with a single character set, and presumably it would be
best (easiest?) to simply choose one character set that will support
most of the card's content than it would be to design an
implementation that can manage character sets for individual
properties.

Since the standard for vCard says the property names ("logo"/"photo")
must be stored in ASCII you might think that this would prevent us
from storing vCards with UTF-8 character set. I think UTF-8 gives us
something of a reprieve, however, because the UTF-8 character set
functions differently than some other character sets, containing first
the complete set of latin-1 characters and representing them as
single-byte characters instead of double-byte characters. As I
understand it, this means that unlike other character sets, any ASCII
characters represented in UTF-8 _are_ single-byte ASCII characters.
The end result of this would mean that you can store a vCard with the
UTF-8 character set and the file will still conform to the requirement
for the property names to be stored in ASCII.

My understanding of this may not be complete or completely accurate,
since it's not my area of expertise, but that's how I've come to
understand it.

s. isaac dealey     434.293.6201
new epoch : isn't it time for a change?

add features without fixtures with
the onTap open source framework

http://www.fusiontap.com
http://coldfusion.sys-con.com/author/4806Dealey.htm

To find out more about Reuters visit www.about.reuters.com

Any views expressed in this message are those of the individual sender, except where the sender
specifically states them to be the views of Reuters Ltd.


Gmane