Sebastian Pipping | 7 Jul 05:59 2007

Introducing the uriparser library


Hello!

uriparser is a strictly RFC 3986 compliant URI parsing library written
in ANSI C. It is cross-platform, fast, supports Unicode and is licensed
under the New BSD license.

Version 0.4.1 was released yesterday.
The project can be found here:

http://uriparser.sourceforge.net/

Sebastian

Sebastian Pipping | 9 Jul 01:24 2007

[Need advice] URIs as part of the XSPF specification


Hello!

I have an URI-related problem here that I would love to hear
some advice to:

== Intro ==
XSPF [1] is an XML-based playlist format using URIs to list
locations of a track. The specification references the
obsoleted RFC 2396 when talking about URIs.

== Problem ==
The XSPF v1 spec [2] requires URIs to be RFC 2396 [3]. As you know
that RFC was superseded by RFC 3986 which now includes IPv6 for
example. The question now is whether URIs with a IPv6 address
should still make valid XSPF v1? Should the reference to
RFC 2396 in the XSPF spec be read as "the latest RFC for URIs"?
I think XSPF should follow the best-practice way for dealing
with RFC updates. Can you tell me what it is?

Thanks in advance for any enlightenments.

Sebastian

[1] http://www.xspf.org/
[2] http://www.xspf.org/xspf-v1.html
[3] http://www.xspf.org/xspf-v1.html#rfc.section.2.3.1

John Cowan | 9 Jul 02:16 2007

Re: [Need advice] URIs as part of the XSPF specification


Sebastian Pipping scripsit:

> I think XSPF should follow the best-practice way for dealing
> with RFC updates. Can you tell me what it is?

You can refer to STD 66.  This is a logical name for whatever the current
RFC for URI Syntax may be.  Note that only a few RFCs have STD numbers:
the list is at http://www.rfc-editor.org/rfcxx00.html .

--

-- 
John Cowan  cowan <at> ccil.org  http://ccil.org/~cowan
The competent programmer is fully aware of the strictly limited size of his own
skull; therefore he approaches the programming task in full humility, and among
other things he avoids clever tricks like the plague.  --Edsger Dijkstra

Sebastian Pipping | 9 Jul 02:26 2007

Re: [Need advice] URIs as part of the XSPF specification


John Cowan wrote:
> You can refer to STD 66.  This is a logical name for whatever the current
> RFC for URI Syntax may be.  Note that only a few RFCs have STD numbers:
> the list is at http://www.rfc-editor.org/rfcxx00.html .

---------------------------------------------------------------
As I undertand you we should have written STD 66 in the
XSPF spec.  Any ideas what to do with the fact that
unfortunately we did not?

Sebastian

John Cowan | 9 Jul 02:44 2007

Re: [Need advice] URIs as part of the XSPF specification


Sebastian Pipping scripsit:

> As I undertand you we should have written STD 66 in the
> XSPF spec.  Any ideas what to do with the fact that
> unfortunately we did not?

Issue an erratum.  Or pretend you did.

--

-- 
John Cowan  <cowan <at> ccil.org>  http://www.ccil.org/~cowan
        Raffiniert ist der Herrgott, aber boshaft ist er nicht.
                --Albert Einstein

Sebastian Pipping | 9 Jul 03:08 2007

Re: [Need advice] URIs as part of the XSPF specification


John Cowan wrote:
> Issue an erratum.  Or pretend you did.

--------------------------------------------------------
:-) We already have an errata document and I just
filled the blank entry for the note on the URI RFC [1].

Thanks for your help, John!

Sebastian

[1] http://wiki.xiph.org/index.php/XSPF_v1_Notes_and_Errata#URI_RFC_used

Sebastian Pipping | 21 Jul 20:45 2007

[Need advice] When to decode '+' to ' '?


Hello!

Say I have a percent-encoded URI and decode
all the percent blocks. The result will
still carry '+'s representing spaces sometimes.
What do I do with that? How do I fully decode
the URI without hurting '+'s that do not represent
spaces. Also for the other way around: Should
I always percent-encode '+'s to save them?
I didn't find anything about it in RFC 3986.

Any help appreciated. Thanks in advance!

Sebastian

Mike Brown | 21 Jul 21:40 2007

Re: [Need advice] When to decode '+' to ' '?


I think you're confusing general percent-encoding in URIs with the rules for 
producing application/x-www-form-urlencoded data. They're related, but 
distinct.

In any case, sender and receiver must agree; if you (the receiver) know the 
data is of the application/x-www-form-urlencoded media type, then you should 
not be blindly applying the modern, general RFC 3986 percent-encoding rules to 
it to interpret it. You must decode it using the reverse of the encoding 
process.

As described in the HTML specs, such data is divided into "&"-separated 
"name=value" pairs, it uses "+" instead of "%20" for space, has had newlines 
normalized to "%0D%0A", and has had "non-alphanumeric"/"reserved" characters 
percent-encoded. This section of the specs predates HTML becoming 
Unicode-friendly, so there is a great deal of ambiguity in exactly which 
characters are percent-encoded and how, but in practice, implementations 
generally align with RFC 3986 when deciding which characters to encode.

So, to encode a set of name-value pairs (character data from an HTML form):

1. In each name and value, encode each CR, LF, or CR+LF to "%0D%0A".

2. In each name and value, encode each space as "+", and percent-encode any 
other character that won't be unambiguous in a URI, especially "+", "&", and 
"=".

3. Insert "=" between each name and value, and "&" between each pair.

To decode:
(Continue reading)

Sebastian Pipping | 22 Jul 18:25 2007

Re: [Need advice] When to decode '+' to ' '?


Mike,

thank you for your detailed explanation!

Sebastian

Mike Brown | 22 Jul 19:30 2007

Re: [Need advice] When to decode '+' to ' '?


Sure :) I hope it helps.

I'd say the general percent encoding rules in RFC 3986 are more for the 
benefit of future URI scheme specs to make sure they're universally forward 
and backward compatible. For example, if someone is coming up with a zzxcvb:// 
scheme, their spec needs to make sure it doesn't invent any oddball 
percent-encoding rules or designate new characters as 'reserved'.

The underspecification of application/x-www-form-urlencoded has long been a 
thorn in my side, particularly when it comes to standardizing the encoding 
used as the basis for percent-encoding non-ASCII characters. XForms 2.0 
is the first spec to say to use UTF-8, but that's too little, too late.

Mike

Sebastian Pipping wrote:
> 
> Mike,
> 
> thank you for your detailed explanation!
> 
> 
> 
> Sebastian
> 


Gmane