Lisa Dusseault | 3 Jan 22:01
Favicon

Lisa's Apps Area Activity for December 2006


Document Status and progress

Active documents:
 - ABNF (RFC 4234) did  a last call on Dec 11; got comments and currently discussing with authors
 - WIDEX Requirements (draft-ietf-widex-requirements): did a last call which ended Dec 25, got comments from David Black (Gen-art reviewer) and discussing with authors
 - draft-ietf-atompub-protocol-12: need to review changes against my comments
 - SMTP Auth (draft-siemborski-rfc2554bis) and POP3 Auth (draft-siemborski-1734bis): agreed to shepherd; currently discussing what Mandatory-to-Implement authentication mechanism is best. 
 - draft-snell-atompub-feed-license: tried to get comments from IETF-last-call reviewers on new version... 
 - draft-nottingham-atompub-feed-history: I'm behind on reviewing this one

Stalled documents (next action is not in my court):
 - draft-ietf-usefor-usefor: waiting to see how progress on WG's usepro document turns out and whether a normative dependency there is the right thing to add to usefor.
 - draft-ietf-imapext-sort: Waiting for BASIC collation draft to progress
 - draft-ietf-imapext-annotate: waiting for RFC Ed notes from authors to finalize approval
 - draft-ietf-sieve-3028bis: waiting for another version with IANA considerations
 - draft-gulbrandsen-collation-basic: need new version

WGs

ATOMPUB: Discussing some draft -12 protocol issues and media types related to atom format RFC
CALSIFY: somewhat of slowdown over holidays.
IMAPEXT: New version of draft-ietf-imapext-i18n but otherwise quiet
SIEVE: WGLC on draft-ietf-sieve-notify
USEFOR: Great active discussion on protocol document edited by Russ Allbery.
WIDEX: Will close WG once requirements draft is published.

Other Apps Area Activity

HTTP: participants are thinking of BOF so need to determine what scope of a WG would be
State-machines:  Stephane Bortzmeyer started a list to prepare a BOF on state-machine description languages
Email server event notifications: discussion at notifications <at> ietf.org about requirements and use cases arising from Lemonade and elsewhere.
SMTP/RFC2821, advancement to Draft Standard: some discussion at  ietf-smtp <at> imc.org, volunteer draft editor found
URI Templates: interesting discussions at uri <at> w3.org

Lisa
Lisa Dusseault | 6 Jan 04:06
Favicon

Fwd: New mailing list: language for IETF state machines

FYI 

Lisa

Begin forwarded message:

From: Stephane Bortzmeyer <bortzmeyer <at> nic.fr>
Date: January 5, 2007 12:25:32 PM PST
Subject: New mailing list: language for IETF state machines

While IETF has a formal standardized language to describe grammars
(ABNF, in RFC 4234), it has no language to describe state machines,
leaving authors to use tables or list of transitions or ASCII-art
(which leads some people to ask for a "richer" format for RFCs).

I believe it would be a good idea to have such a language (see
rationale) so I wrote an Internet-draft
(draft-bortzmeyer-language-state-machines-01.txt) describing a
candidate, Cosmogol (further documented in http://www.cosmogol.fr/).

There is now a mailing list, to see if there is sufficient interest
for the IETF to go on, have a BoF in Prague, may be create a Working
Group, etc:






_______________________________________________
Ietf mailing list

Picon

New mailing list: language for IETF state machines

While IETF has a formal standardized language to describe grammars
(ABNF, in RFC 4234), it has no language to describe state machines,
leaving authors to use tables or list of transitions or ASCII-art
(which leads some people to ask for a "richer" format for RFCs).

I believe it would be a good idea to have such a language (see
http://www1.ietf.org/mail-archive/web/ietf/current/msg42592.html for a
rationale) so I wrote an Internet-draft
(draft-bortzmeyer-language-state-machines-01.txt) describing a
candidate, Cosmogol (further documented in http://www.cosmogol.fr/).

There is now a mailing list, to see if there is sufficient interest
for the IETF to go on, have a BoF in Prague, may be create a Working
Group, etc:

cosmogol <at> ietf.org

https://www1.ietf.org/mailman/listinfo/cosmogol

John C Klensin | 19 Jan 16:34

FWD: I-D ACTION:draft-klensin-unicode-escapes-00.txt

Hi.

At the Apps Area meeting at the last IETF, there was a
discussion of ways of referring to or escaping a Unicode
character in an ASCII protocol or context.  I took an action to
write a short I-D that explicitly recommended the use of form
that reference Unicode code points directly rather than, e.g.,
encoding the octets of UTF-8.

That document has been posted and, with the concurrence of the
ADs, discussion has been directed to the discuss <at> apps.ietf.org
list.

If you have any interest in internationalization issues, a
careful reading of, and comments on, this proposal would be
greatly appreciated.  

In the absence of comments, I will be pushing for Last Call
fairly soon.

thanks,
     john
Picon Favicon
From: Hi. At the Apps Area meeting at the last IETF, there was a discussion of ways of referring to or escaping a Unicode character in an ASCII protocol or context. I took an action to write a short I-D that explicitly recommended the use of form that reference Unicode code points directly rather than, e.g., encoding the octets of UTF-8. That document has been posted and, with the concurrence of the ADs, discussion has been directed to the discuss <at> apps.ietf.org list. If you have any interest in internationalization issues, a careful reading of, and comments on, this proposal would be greatly appreciated. In the absence of comments, I will be pushing for Last Call fairly soon. thanks, john <Internet-Drafts <at> ietf.org>
Subject: I-D ACTION:draft-klensin-unicode-escapes-00.txt
Date: 2007-01-18 20:50:02 GMT
A New Internet-Draft is available from the on-line Internet-Drafts 
directories.

	Title		: ASCII Escaping of Unicode Characters
	Author(s)	: J. Klensin
	Filename	: draft-klensin-unicode-escapes-00.txt
	Pages		: 8
	Date		: 2007-1-18
	
   There are a number of circumstances in which an escape mechanism is
   needed in conjunction with a protocol to encode characters that
   cannot by represented or transmitted directly.  With ASCII coding the
   traditional escape has been either the decimal or hexadecimal offset
   of the character, written in a variety of different ways.  The move
   to Unicode, where characters occupy two or more octets and may be
   coded in several different forms, has further complicated the
   question of escapes.  This document discusses some the options now in
   use and makes a proposal for general use in IETF protocols.

A URL for this Internet-Draft is:
http://www.ietf.org/internet-drafts/draft-klensin-unicode-escapes-00.txt

To remove yourself from the I-D Announcement list, send a message to 
i-d-announce-request <at> ietf.org with the word unsubscribe in the body of 
the message. 
You can also visit https://www1.ietf.org/mailman/listinfo/I-D-announce 
to change your subscription settings.

Internet-Drafts are also available by anonymous FTP. Login with the 
username "anonymous" and a password of your e-mail address. After 
logging in, type "cd internet-drafts" and then 
"get draft-klensin-unicode-escapes-00.txt".

A list of Internet-Drafts directories can be found in
http://www.ietf.org/shadow.html 
or ftp://ftp.ietf.org/ietf/1shadow-sites.txt

Internet-Drafts can also be obtained by e-mail.

Send a message to:
	mailserv <at> ietf.org.
In the body type:
	"FILE /internet-drafts/draft-klensin-unicode-escapes-00.txt".
	
NOTE:	The mail server at ietf.org can return the document in
	MIME-encoded form by using the "mpack" utility.  To use this
	feature, insert the command "ENCODING mime" before the "FILE"
	command.  To decode the response(s), you will need "munpack" or
	a MIME-compliant mail reader.  Different MIME-compliant mail readers
	exhibit different behavior, especially when dealing with
	"multipart" MIME messages (i.e. documents which have been split
	up into multiple messages), so check your local documentation on
	how to manipulate these messages.

Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.
Attachment: message/external-body, 142 bytes
Attachment (draft-klensin-unicode-escapes-00.txt): message/external-body, 69 bytes
_______________________________________________
I-D-Announce mailing list
I-D-Announce <at> ietf.org
https://www1.ietf.org/mailman/listinfo/i-d-announce
Keith Moore | 19 Jan 17:35
Picon

Re: FWD: I-D ACTION:draft-klensin-unicode-escapes-00.txt

based on a quick scan, it mostly looks fine to me, with a few caveats:

- it should be clear that this is for newly-designed protocols only.  it 
shouldn't be interpreted as a request to change existing protocols 
(including deployed and nonstandard protocols being standardized by 
IETF), as this would generally break backward compatibility by changing 
the meaning of '\'

- it should be clear that this is for occasional use of non-ASCII 
characters within a protocol field that is constrained to contain only 
ASCII characters (or a subset), rather than a recommendation for how to 
represent non-ASCII characters in a protocol field that is capable of 
carrying, say, UTF-8.

if the text already makes this clear, I apologize for not reading the 
document more carefully.

it might also be worthwhile to recommend this notation for use in 
describing character literals in RFCs, in such a way that it could be 
referenced by such RFCs.

Keith

> At the Apps Area meeting at the last IETF, there was a
> discussion of ways of referring to or escaping a Unicode
> character in an ASCII protocol or context.  I took an action to
> write a short I-D that explicitly recommended the use of form
> that reference Unicode code points directly rather than, e.g.,
> encoding the octets of UTF-8.
> 
> That document has been posted and, with the concurrence of the
> ADs, discussion has been directed to the discuss <at> apps.ietf.org
> list.
> 
> If you have any interest in internationalization issues, a
> careful reading of, and comments on, this proposal would be
> greatly appreciated.  
> 
> In the absence of comments, I will be pushing for Last Call
> fairly soon.

John C Klensin | 19 Jan 17:38

Typographical error in draft-klensin-unicode-escapes-00

Hi.

It has been called to my attention that I should have checked
the ABNF syntax rules more carefully before posting this draft.
The productions for Hex-quad and Full-form should not have
spaces between the repeat count and the token.  The right sides
should read 
   4*4HexDigit   and
   2*2Hex-Quad
respectively.

Fixed in the working draft -- I'm not going to create confusion
by re-posting at this time.

     john

Keith Moore | 19 Jan 17:43
Picon

Re: FWD: I-D ACTION:draft-klensin-unicode-escapes-00.txt

one more caveat: protocol specifications need to specify this notation 
explicitly (either directly or by reference to the published RFC) if 
they are going to use it. conversely, this notation SHOULD NOT (maybe 
MUST NOT) be used unless it is part of the protocol specification.

one problem I see with introducing any new notation for characters is 
that it does create normalization issues by introducing additional ways 
to say the same thing.  e.g. after introducing this notation, "A" could 
also be expressed as \u0041 or \U00000041.  and it then becomes 
necessary to manage this conversion when copying fields from one 
protocol that supports the new notation (or in which it is benign) to 
another protocol that does not support the notation.

as an example of a potential source of problems, I'd hate to see this 
notation end up in X.509 certs.

John C Klensin | 19 Jan 18:40

Re: FWD: I-D ACTION:draft-klensin-unicode-escapes-00.txt


--On Friday, 19 January, 2007 11:35 -0500 Keith Moore
<moore <at> cs.utk.edu> wrote:

> based on a quick scan, it mostly looks fine to me, with a few
> caveats:
> 
> - it should be clear that this is for newly-designed protocols
> only.  it shouldn't be interpreted as a request to change
> existing protocols (including deployed and nonstandard
> protocols being standardized by IETF), as this would generally
> break backward compatibility by changing the meaning of '\'

That was intended to be clear already.  If it is not
sufficiently so, suggested text, or at least a place to put it,
would be welcome.

More generally, I'm not completely wild about the \u and \U
conventions, as you can probably deduce from the text.  It
seemed like the best choice on balance, but this is an area in
which, were a consensus to emerge around something else, I would
enthusiastically agree.

> - it should be clear that this is for occasional use of
> non-ASCII characters within a protocol field that is
> constrained to contain only ASCII characters (or a subset),
> rather than a recommendation for how to represent non-ASCII
> characters in a protocol field that is capable of carrying,
> say, UTF-8.
> 
> if the text already makes this clear, I apologize for not
> reading the document more carefully.

I don't know if it is clear enough or not.   At some level, if
you didn't conclude that it was clear on reading the draft, then
that is evidence that it isn't clear enough... but I don't know
how carefully you read it.  Certainly I agree with both points
and would welcome suggestions as to how they can be clarified if
they are not sufficiently clear.

> it might also be worthwhile to recommend this notation for use
> in describing character literals in RFCs, in such a way that
> it could be referenced by such RFCs.

Yeah.  The document does address this, but (deliberately) more
or less in passing.   Again, I'd welcome specific community
input and suggestions about this.  I've looked at several RFCs
and U+NNNN seems to be the preferred format for character
literals and, more commonly, for identifying the code point
associated with a named character.  It is also, fwiw, the one I
prefer for that purpose.  But it is fairly poor for inline use
in a protocol.  The authoritative definition and reference for
that form is the "Code Points" section of "Appendix A:
Notational Conventions" of Unicode 5.0 (the reference to the
book is the I-D).  I don't believe that we add value by
repeating that definition in an RFC, but could be easily talked
out of that position.

>From your followup note...

> one more caveat: protocol specifications need to specify this
> notation explicitly (either directly or by reference to the
> published RFC) if they are going to use it. conversely, this
> notation SHOULD NOT (maybe MUST NOT) be used unless it is part
> of the protocol specification.

Please suggest text for specifying those rules.  I constructed
this rather more as advice to protocol designers and, to a
lesser extent, to document authors, rather than a base for
notational definitions to be included by reference.  That could
be changed, but I'd welcome textual suggestions.

> one problem I see with introducing any new notation for
> characters is that it does create normalization issues by
> introducing additional ways to say the same thing.  e.g. after
> introducing this notation, "A" could also be expressed as
> \u0041 or \U00000041.  and it then becomes necessary to manage
> this conversion when copying fields from one protocol that
> supports the new notation (or in which it is benign) to
> another protocol that does not support the notation.

We are, unfortunately, already _deep_ into that problem.  Part
of what motivated this draft was to prevent it from getting
worse.  And, using the example above, the conversion between
\u0041 and \U00000041 is obvious.  You can even do it by eye,
and the relationship to %41 is equally clear.  However, consider
that "A" with a Diaeresis (U+00C4 -- note the use of the "code
point" notation here).  We then have \u00C4 and \U000000C4,
which are easily mapped visually to each other and to the code
point form, which can easily be looked up in a table if you
don't know what it represents.  But it is also, if I have done
the calculation correctly, %C3%83 and that form (used in URIs
and IRIs) is seriously non-intuitive and certainly can't be
converted visually.

> as an example of a potential source of problems, I'd hate to
> see this notation end up in X.509 certs.

To which I'd have to say "what would you like to see?".  My
personal answer is that I'd prefer to see those certs contain
UTF-8 encoding with escapes used only to discuss or present the
certs in contexts in which the UTF-8 and native characters are
not available (and maybe if it is, for explanation and clarity).
But I don't think this spec has much impact on that problem.  If
it should, I'd welcome text suggestions.

General observation: I generated this document because it became
clear that someone needed to something.  The alternative was
that we continue to drift toward a "every protocol makes its own
choices, ignoring all others" situation, with many of them
favoring  escaped UTF-8 (which I consider to be nearly the worst
possible choice for most circumstances... for reasons that
should be clear even from the trivial examples above).  But I
have no particularly strong commitment to any particular
recommendation as long as we establish a recommendation.   From
this point forward, I'm more or less acting as secretary/editor
for whatever comments and consensus emerge rather than trying to
advocate a particular solution or specification.   One
implication of that is that if anyone believes that additional
text, clarification, or specification belongs in the document, I
would really appreciate very specific textual suggestions.

     john

Stephane Bortzmeyer | 21 Jan 22:06
Picon

Escaping the escape (Was: I-D ACTION:draft-klensin-unicode-escapes-00.txt

On Fri, Jan 19, 2007 at 10:34:27AM -0500,
 John C Klensin <klensin <at> jck.com> wrote 
 a message of 179 lines which said:

> In the absence of comments, I will be pushing for Last Call fairly
> soon.

I may be blind but I do not see a discussion about how to escape the
escape sequence. For instance, when discussing the XML standard
(&#nnnn;), there is no mention that it requires a way to write a bare
& (&amp;).

In the same way, if you use the \unnnn standard, how do you ensure
that you can still write a real \u string if you want it?

It seems this is important enough to postpone the Last Call.

Stephane Bortzmeyer | 21 Jan 22:12
Picon

Re: Typographical error in draft-klensin-unicode-escapes-00

On Fri, Jan 19, 2007 at 11:38:56AM -0500,
 John C Klensin <john-ietf <at> jck.com> wrote 
 a message of 18 lines which said:

> It has been called to my attention that I should have checked the
> ABNF syntax rules more carefully before posting this draft.

There is another strange thing: 

   BMP-form =  "\u" Hex-quad
   Full-form =  "\U" 2*2 Hex-quad

How can you tell one from the other before you encounter the second
Hex-quad? By the case of the U, as it seems from reading the text
following the grammar?

If so, this is wrong because ABNF strings are case-insensitive so one
should use:

   BMP-form  =  "\" %x75 Hex-quad   
   Full-form =  "\" %x55 2*2Hex-quad


Gmane