Internet-Drafts | 3 Jan 21:50 2006
Picon

I-D ACTION:draft-ietf-geopriv-revised-civic-lo-01.txt

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the Geographic Location/Privacy Working Group of the IETF.

	Title		: Revised Civic Location Format for PIDF-LO
	Author(s)	: M. Thomson, J. Winterbottom
	Filename	: draft-ietf-geopriv-revised-civic-lo-01.txt
	Pages		: 18
	Date		: 2006-1-3
	
This document defines an XML format for the representation of civic
   location.  This format is designed for use with PIDF Location Object
   (PIDF-LO) documents.  The format is based on the civic address
   definition in PIDF-LO, but adds several new elements based on the
   civic types defined for DHCP, and adds a hierarchy to address complex
   road identity schemes.  The format also includes support for the
   xml:lang language tag and restricts the types of elements where
   appropriate.

A URL for this Internet-Draft is:
http://www.ietf.org/internet-drafts/draft-ietf-geopriv-revised-civic-lo-01.txt

To remove yourself from the I-D Announcement list, send a message to 
i-d-announce-request <at> ietf.org with the word unsubscribe in the body of the message.  
You can also visit https://www1.ietf.org/mailman/listinfo/I-D-announce 
to change your subscription settings.

Internet-Drafts are also available by anonymous FTP. Login with the username
"anonymous" and a password of your e-mail address. After logging in,
type "cd internet-drafts" and then
	"get draft-ietf-geopriv-revised-civic-lo-01.txt".
(Continue reading)

Andrew Newton | 4 Jan 15:48 2006
Picon

Re: [Geopriv] Domain identifier in common policy

Henning,

Sorry to bring this up again, but Hannes made me reread the text and  
I caught a nit.

On Nov 14, 2005, at 7:32 PM, Henning Schulzrinne wrote:
> I looked at RFC 3490 (IDNA) in more detail just now. For the  
> 'domain' attribute, I think we can simplify the comparison process  
> as follows:
>
> --- begin text ---
>
> Common policy MUST use UTF-8 to store domain names in 'id' domain  
> attributes. For non-IDNs, lower-case ASCII SHOULD be used.

[ snip ]...

Is "MUST use UTF-8" intended to purposefully rule out UTF-16, which  
all XML parsers are required to understand?  That is fine if it is,  
but this would seem to then restrict the document to UTF-8.  If this  
is a protocol requirement, it needs to be stated.

-andy
Henning Schulzrinne | 5 Jan 17:44 2006

Re: [Geopriv] Domain identifier in common policy

I don't feel strongly about this; my only gut feeling is that it is a 
good thing to reduce options that don't add value. As far as I know, the 
set of strings representable by UTF-8 is exactly the same as the one 
representable in UTF-16. What is the advantage of allowing UTF-16?

(One disadvantage is that any comparison would have to convert to a 
common format if both are allowed.)

>> Common policy MUST use UTF-8 to store domain names in 'id' domain 
>> attributes. For non-IDNs, lower-case ASCII SHOULD be used.

> Is "MUST use UTF-8" intended to purposefully rule out UTF-16, which all 
> XML parsers are required to understand?  That is fine if it is, but this 
> would seem to then restrict the document to UTF-8.  If this is a 
> protocol requirement, it needs to be stated.
> 
> -andy
Andrew Newton | 5 Jan 20:04 2006
Picon

Re: [Geopriv] Domain identifier in common policy


On Jan 5, 2006, at 11:44 AM, Henning Schulzrinne wrote:

> I don't feel strongly about this; my only gut feeling is that it is  
> a good thing to reduce options that don't add value. As far as I  
> know, the set of strings representable by UTF-8 is exactly the same  
> as the one representable in UTF-16. What is the advantage of  
> allowing UTF-16?
>
> (One disadvantage is that any comparison would have to convert to a  
> common format if both are allowed.)

Unless you foresee the development of special purpose XML parsers for  
this application, I can see no advantage to ruling out UTF-16.  As  
for your stated disadvantage, which XML parsers pass raw bytes to the  
application instead of a common format?  Admittedly there are gobs of  
XML parsers out there, but I've never seen one that does this.

-andy 
Henning Schulzrinne | 6 Jan 10:39 2006

Re: [Geopriv] Domain identifier in common policy

How do XML parsers pass back string elements to the application?

> Unless you foresee the development of special purpose XML parsers for 
> this application, I can see no advantage to ruling out UTF-16.  As for 
> your stated disadvantage, which XML parsers pass raw bytes to the 
> application instead of a common format?  Admittedly there are gobs of 
> XML parsers out there, but I've never seen one that does this.
> 
> -andy
Andrew Newton | 6 Jan 14:43 2006
Picon

Re: [Geopriv] Domain identifier in common policy

Usually they pass back the element content as a unicode compatible  
string or an array of unicode compatible characters, and the  
application does not need to make a distinction between UTF-8 and  
UTF-16.  Some languages have native Unicode support, but in cases  
where that is not true special types are defined to represent xml  
character data.

-andy

On Jan 6, 2006, at 4:39 AM, Henning Schulzrinne wrote:

> How do XML parsers pass back string elements to the application?
>
>> Unless you foresee the development of special purpose XML parsers  
>> for this application, I can see no advantage to ruling out  
>> UTF-16.  As for your stated disadvantage, which XML parsers pass  
>> raw bytes to the application instead of a common format?   
>> Admittedly there are gobs of XML parsers out there, but I've never  
>> seen one that does this.
>> -andy
>
Andrew Newton | 6 Jan 14:55 2006
Picon

Re: [Geopriv] Domain identifier in common policy


On Jan 5, 2006, at 11:44 AM, Henning Schulzrinne wrote:

I don't feel strongly about this; my only gut feeling is that it is a good thing to reduce options that don't add value. As far as I know, the set of strings representable by UTF-8 is exactly the same as the one representable in UTF-16. What is the advantage of allowing UTF-16?


(One disadvantage is that any comparison would have to convert to a common format if both are allowed.)


BTW, from BCP 70, last paragraph in Section 5.1:

   Restricting XML data to only be expressed in UTF-8 is an additional
   syntactic restriction (see Section 4.3) which, depending on
   circumstances, might add additional implementation complexity.

-andy
_______________________________________________
Simple mailing list
Simple <at> ietf.org
https://www1.ietf.org/mailman/listinfo/simple
Henning Schulzrinne | 6 Jan 15:23 2006

Re: [Geopriv] Domain identifier in common policy

FWIW, I just picked the first XML application that came to mind, in PHP. 
It doesn't support UTF-16. Source: http://de.php.net/xml and 
http://de.php.net/manual/en/ref.xml.php#xml.encoding

 From all I can tell for PHP, it represents strings in their 'native' 
(byte) representation, so that if you have two XML documents, one using 
8859-1 and one UTF-8, you need to convert between them to compare 
literal strings outside the ASCII range. 
(http://de.php.net/manual/en/language.types.string.php)

I don't see why restricting character sets to UTF-8 complicates processing.

Andrew Newton wrote:
> Usually they pass back the element content as a unicode compatible 
> string or an array of unicode compatible characters, and the application 
> does not need to make a distinction between UTF-8 and UTF-16.  Some 
> languages have native Unicode support, but in cases where that is not 
> true special types are defined to represent xml character data.
> 
> -andy
> 
> On Jan 6, 2006, at 4:39 AM, Henning Schulzrinne wrote:
> 
>> How do XML parsers pass back string elements to the application?
>>
>>> Unless you foresee the development of special purpose XML parsers for 
>>> this application, I can see no advantage to ruling out UTF-16.  As 
>>> for your stated disadvantage, which XML parsers pass raw bytes to the 
>>> application instead of a common format?  Admittedly there are gobs of 
>>> XML parsers out there, but I've never seen one that does this.
>>> -andy
>>
Andrew Newton | 6 Jan 16:13 2006
Picon

Re: [Geopriv] Domain identifier in common policy

If it doesn't support UTF-16 as the source encoding, it is not  
compliant with the XML standard.

BTW, thanks for the reference point with PHP.  I started looking  
around to see what various parsers using languages without native  
Unicode do.  Both libxml2 and Xerces-C define datatypes for xml  
character data.  libxml2 always hands the application UTF-8 but it  
supports UTF-16 as a source encoding (making one draw the conclusion  
that it transcodes to UTF-8 when UTF-8 is not the source encoding).   
Expat does the same, but there is a compile-time option to have the  
parser hand the application UTF-16 instead of UTF-8.  From what I can  
tell, PHP is the exception and not the rule.

-andy

On Jan 6, 2006, at 9:23 AM, Henning Schulzrinne wrote:

> FWIW, I just picked the first XML application that came to mind, in  
> PHP. It doesn't support UTF-16. Source: http://de.php.net/xml and  
> http://de.php.net/manual/en/ref.xml.php#xml.encoding
>
> From all I can tell for PHP, it represents strings in their  
> 'native' (byte) representation, so that if you have two XML  
> documents, one using 8859-1 and one UTF-8, you need to convert  
> between them to compare literal strings outside the ASCII range.  
> (http://de.php.net/manual/en/language.types.string.php)
>
> I don't see why restricting character sets to UTF-8 complicates  
> processing.
>
> Andrew Newton wrote:
>> Usually they pass back the element content as a unicode compatible  
>> string or an array of unicode compatible characters, and the  
>> application does not need to make a distinction between UTF-8 and  
>> UTF-16.  Some languages have native Unicode support, but in cases  
>> where that is not true special types are defined to represent xml  
>> character data.
>> -andy
>> On Jan 6, 2006, at 4:39 AM, Henning Schulzrinne wrote:
>>> How do XML parsers pass back string elements to the application?
>>>
>>>> Unless you foresee the development of special purpose XML  
>>>> parsers for this application, I can see no advantage to ruling  
>>>> out UTF-16.  As for your stated disadvantage, which XML parsers  
>>>> pass raw bytes to the application instead of a common format?   
>>>> Admittedly there are gobs of XML parsers out there, but I've  
>>>> never seen one that does this.
>>>> -andy
>>>
Henning Schulzrinne | 6 Jan 17:03 2006

Re: [Geopriv] Domain identifier in common policy

At least, it's not a general problem...

I still fail to see the practical advantage of allowing UTF-16, given 
that we control the input and that this is not general text, but rather 
strings with a very specific purpose.

Andrew Newton wrote:
> If it doesn't support UTF-16 as the source encoding, it is not compliant 
> with the XML standard.
> 
> BTW, thanks for the reference point with PHP.  I started looking around 
> to see what various parsers using languages without native Unicode do.  
> Both libxml2 and Xerces-C define datatypes for xml character data.  
> libxml2 always hands the application UTF-8 but it supports UTF-16 as a 
> source encoding (making one draw the conclusion that it transcodes to 
> UTF-8 when UTF-8 is not the source encoding).  Expat does the same, but 
> there is a compile-time option to have the parser hand the application 
> UTF-16 instead of UTF-8.  From what I can tell, PHP is the exception and 
> not the rule.
> 
> -andy
> 
> On Jan 6, 2006, at 9:23 AM, Henning Schulzrinne wrote:
> 
>> FWIW, I just picked the first XML application that came to mind, in 
>> PHP. It doesn't support UTF-16. Source: http://de.php.net/xml and 
>> http://de.php.net/manual/en/ref.xml.php#xml.encoding
>>
>> From all I can tell for PHP, it represents strings in their 'native' 
>> (byte) representation, so that if you have two XML documents, one 
>> using 8859-1 and one UTF-8, you need to convert between them to 
>> compare literal strings outside the ASCII range. 
>> (http://de.php.net/manual/en/language.types.string.php)
>>
>> I don't see why restricting character sets to UTF-8 complicates 
>> processing.
>>
>> Andrew Newton wrote:
>>> Usually they pass back the element content as a unicode compatible 
>>> string or an array of unicode compatible characters, and the 
>>> application does not need to make a distinction between UTF-8 and 
>>> UTF-16.  Some languages have native Unicode support, but in cases 
>>> where that is not true special types are defined to represent xml 
>>> character data.
>>> -andy
>>> On Jan 6, 2006, at 4:39 AM, Henning Schulzrinne wrote:
>>>> How do XML parsers pass back string elements to the application?
>>>>
>>>>> Unless you foresee the development of special purpose XML parsers 
>>>>> for this application, I can see no advantage to ruling out UTF-16.  
>>>>> As for your stated disadvantage, which XML parsers pass raw bytes 
>>>>> to the application instead of a common format?  Admittedly there 
>>>>> are gobs of XML parsers out there, but I've never seen one that 
>>>>> does this.
>>>>> -andy
>>>>

Gmane