Bruce Lilly | 1 Feb 14:00
Picon

Re: Revisiting RFC 2822 grammar (obs-utext and unstructured)


Bruce Lilly wrote:

> I think that the following does the right thing (obs-utext can be empty
> or it can start or end with any ASCII octet and can have any sequence
> except CRLF, which is handled in unstructured by FWS, and unstructured
> can be completely empty or it can begin or end with utext or FWS, but
> any instance of utext is separated from any other instance by FWS):
> 
> obs-utext       =       *(*obs-char (*LF / (*CR 1*obs-char))) *CR
> 
> unstructured    =       *(utext FWS) *utext

Oops, the last line should be:

unstructured    =       *(*utext FWS) *utext

Pete Resnick | 2 Feb 18:12
Favicon

Re: Revisiting RFC 2822 grammar (obs-utext and unstructured)


On 2/1/04 at 8:00 AM -0500, Bruce Lilly wrote:

>>I think that the following does the right thing (obs-utext can be 
>>empty or it can start or end with any ASCII octet and can have any 
>>sequence except CRLF, which is handled in unstructured by FWS, and 
>>unstructured can be completely empty or it can begin or end with 
>>utext or FWS, but any instance of utext is separated from any other 
>>instance by FWS):
>>
>>obs-utext = *(*obs-char (*LF / (*CR 1*obs-char))) *CR
>>
>>unstructured = *(utext FWS) *utext
>
>Oops, the last line should be:
>
>unstructured    =       *(*utext FWS) *utext

No, that doesn't look right. It allows multiple adjacent occurrences 
of FWS. How about:

unstructured = *(1*utext FWS) *utext

pr
--

-- 
Pete Resnick <http://www.qualcomm.com/~presnick/>
QUALCOMM Incorporated - Direct phone: (858)651-4478, Fax: (858)651-1102

Charles Lindsey | 3 Feb 13:10
Picon
Picon

Re: Revisiting RFC 2822 grammar (obs-utext and unstructured)


In <401AFF6C.4030806 <at> verizon.net> Bruce Lilly <blilly <at> verizon.net> writes:

>I think that the following does the right thing (obs-utext can be empty
>or it can start or end with any ASCII octet and can have any sequence
>except CRLF, which is handled in unstructured by FWS, and unstructured
>can be completely empty or it can begin or end with utext or FWS, but
>any instance of utext is separated from any other instance by FWS):

>obs-utext       =       *(*obs-char (*LF / (*CR 1*obs-char))) *CR

So, if X is some randon obs-char, "XCR" is an obs-utext, and "LF" in an
obs-utext. Therefore "XCRLF" is an unstructured. Q.N.E.D.

>unstructured    =       *(utext FWS) *utext

No, that is not right with either your subsequent fix or with Pete's
subsequent fix, because you allow two FWS adjacent, and Pete does not
allow an unstructured consisting of FWS and nothing else.

--

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl <at> clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5

Bruce Lilly | 4 Feb 01:54
Picon

Re: Revisiting RFC 2822 grammar (obs-utext and unstructured)


Charles Lindsey wrote:
> In <401AFF6C.4030806 <at> verizon.net> Bruce Lilly <blilly <at> verizon.net> writes:
> 
> 
>>I think that the following does the right thing (obs-utext can be empty
>>or it can start or end with any ASCII octet and can have any sequence
>>except CRLF, which is handled in unstructured by FWS, and unstructured
>>can be completely empty or it can begin or end with utext or FWS, but
>>any instance of utext is separated from any other instance by FWS):
> 
> 
>>obs-utext       =       *(*obs-char (*LF / (*CR 1*obs-char))) *CR
> 
> 
> So, if X is some randon obs-char, "XCR" is an obs-utext, and "LF" in an
> obs-utext. Therefore "XCRLF" is an unstructured. Q.N.E.D.
> 
> 
>>unstructured    =       *(utext FWS) *utext
> 
> 
> No, that is not right with either your subsequent fix or with Pete's
> subsequent fix, because you allow two FWS adjacent, and Pete does not
> allow an unstructured consisting of FWS and nothing else.

OK, there's a problem with 2822; it needs an obs-unstructured as well
as unstructured (see section 4 introductory paragraphs, specifically
the reference to 3.2.3).  Here's one way to tie everything together:

(Continue reading)

Charles Lindsey | 4 Feb 13:44
Picon
Picon

Re: Revisiting RFC 2822 grammar (obs-utext and unstructured)


In <402042BC.2050204 <at> verizon.net> Bruce Lilly <blilly <at> verizon.net> writes:

>unstructured = *(text [FWS])
>   assuming unstructured fields are defined as in my revised grammar, e.g.
>   comments = "Comments" ":" [FWS] unstructured CRLF
>   (see discussion below)
>   optionally one could define
>   utext = *(text [FWS])
>   and then define unstructured as utext, but what would be the point...

>obs-utext either as defined in 2822 or as above, i.e. empty, can start
>   or end with obs-char, CR, or LF, but can't have CRLF pair

Yes, but that is getting a long way from what seems to be the established
convention that *text things consist of just a single character (or
perhaps a single character with some naked CF or LF attached).

>obs-unstructured = *(obs-utext FWS) [obs-utext]
>   i.e. cannot have two adjacent instances of obs-utext strings (must
>   have FWS separator), may have multiple adjacent FWS instances (since
>   obs-utext may be empty, and in order to comply with the section 4
>   normative text regarding parsing of WS-only continuation lines), may
>   be empty, may begin or end with any obs-utext string or with FWS,
>   any CRLF pair is followed by WS (as part of FWS)

And I don't think we want two adjacent FWS. Your revised grammar went to
much trouble to avoid adjacent CFWS (or FWS in some cases), and that was
seen as a great improvement. Now they seem to have come back in.

(Continue reading)

Bruce Lilly | 5 Feb 04:41
Picon

Re: Revisiting RFC 2822 grammar (obs-utext and unstructured)


Charles Lindsey wrote:

> Yes, but that is getting a long way from what seems to be the established
> convention that *text things consist of just a single character (or
> perhaps a single character with some naked CF or LF attached).

?!?  "text" should instantiate a single character, and it does both
in 2822 and in the revised grammar.  "*text" by definition (see RFC
2234) means any number (0 to infinity) of occurrences of "text" and
therefore could not possibly be restricted to a single character.

> And I don't think we want two adjacent FWS. Your revised grammar went to
> much trouble to avoid adjacent CFWS (or FWS in some cases), and that was
> seen as a great improvement. Now they seem to have come back in.

RFC 2822 section 4:

   Another key difference between the obsolete and the current syntax is
   that the rule in section 3.2.3 regarding lines composed entirely of
   white space in comments and folding white space does not apply.  See
   the discussion of folding white space in section 4.2 below.

Permitting multiple adjacent FWS instances in the obs- constructs was
intended to comply with those parsing requirements, but I suppose that's
handled by obs-FWS (which unlike the other productions in 2822, is not
left-justified in the RFC text).

> May I suggest you take another look at the grammar I originally suggested.

(Continue reading)

Charles Lindsey | 5 Feb 11:59
Picon
Picon

Re: Revisiting RFC 2822 grammar (obs-utext and unstructured)


In <4021BB72.6010106 <at> verizon.net> Bruce Lilly <blilly <at> verizon.net> writes:

>Charles Lindsey wrote:

>> Yes, but that is getting a long way from what seems to be the established
>> convention that *text things consist of just a single character (or
>> perhaps a single character with some naked CF or LF attached).

>?!?  "text" should instantiate a single character, and it does both
>in 2822 and in the revised grammar.  "*text" by definition (see RFC
>2234) means any number (0 to infinity) of occurrences of "text" and
>therefore could not possibly be restricted to a single character.

Oh Dear! You misread what I wrote. When I said "*text", I didn't meant
"*text", I meant the set of rules named 'utext', 'ctext', 'dtext', etc"
which all have the property that they produce just a single character.
Your latest offering seemed to be breaking with that convention.

>> And I don't think we want two adjacent FWS. Your revised grammar went to
>> much trouble to avoid adjacent CFWS (or FWS in some cases), and that was
>> seen as a great improvement. Now they seem to have come back in.

>RFC 2822 section 4:

>   Another key difference between the obsolete and the current syntax is
>   that the rule in section 3.2.3 regarding lines composed entirely of
>   white space in comments and folding white space does not apply.  See
>   the discussion of folding white space in section 4.2 below.

(Continue reading)

Bruce Lilly | 5 Feb 15:18
Picon

Re: Revisiting RFC 2822 grammar (obs-utext and unstructured)


Charles Lindsey wrote:

> Oh Dear! You misread what I wrote. When I said "*text", I didn't meant
> "*text", I meant the set of rules named 'utext', 'ctext', 'dtext', etc"
> which all have the property that they produce just a single character.
> Your latest offering seemed to be breaking with that convention.

ctext, dtext, qtext, and text are single-octet productions (or should be).
In 2822, obs-text and obs-utext are not.

obs-text only appears in text (where obs-char makes more sense) and
obs-utext (which is defined as obs-text). obs-utext appears only in utext,
and utext appears only in unstructured.

I'll take another look at the unstructured and related productions.

> What I would like him to do would be
>     Subject: Re: foo
> but the proper way to address that issue would be to establish some rule
> or convention about whether subjects could be refolded or have WSP
> collapsed in the course of generating a followup/reply.
> 
> But following a strict reading of 3.6.5 in RFC 2822, I would argue that
> the only compliant way would be
>     Subject:Re:           foo
> from which it is evident that every MUA implementation known to me is
> non-compliant with RFC 2822 :-( . Another bug for Pete to worry over...

The main point is that any such "rule or convention" imposes structure
(Continue reading)

Bruce Lilly | 6 Feb 05:21
Picon

Re: Revisiting RFC 2822 grammar (obs-utext and unstructured)


Bruce Lilly wrote:

> obs-text only appears in text (where obs-char makes more sense) and
> obs-utext (which is defined as obs-text). obs-utext appears only in utext,
> and utext appears only in unstructured.
> 
> I'll take another look at the unstructured and related productions.

OK, here's another attempt.  Working backwards from unstructured fields:

comments         =   "Comments" ":" unstructured CRLF
obs-comments     =   "Comments" *WSP ":" [FWS] obs-unstructured CRLF
etc.

N.B. no [FWS] after the colon in the non-obs fields (an unstructured
field cannot be allowed to end with FWS (which it would do if the
"unstructured" production is empty) using non-obs rules, since that
would leave CRLF 1*WS CRLF, i.e. a whitespace only "continuation"
line at the end of the field, which is prohibited by sect. 3.2.3.).
Per contra, that must be permitted in the obs- unstructured fields.
That is why separate unstructured and obs-unstructured productions
are required.

"unstructured" must therefore permit beginning with FWS, but only
if there is content after the FWS. It must permit an empty instance
so that the unstructured field body may be empty.  Since there is
[FWS] in the obs- unstructured fields, and we don't want an explicit
case of two adjacent instances of FWS (CRLF 1*WSP CRLF being provided
for by obs-FWS), a separate obs-unstructured is required, and it must
(Continue reading)

Bruce Lilly | 7 Feb 15:11
Picon

Re: Revisiting RFC 2822 grammar (obs-utext and unstructured)


Two more points related to the 2822 grammar:
1. mixing vs. separating obs- and non-obs constructs
2. clarification of "obs"

Bruce Lilly wrote:
> comments         =   "Comments" ":" unstructured CRLF
> obs-comments     =   "Comments" *WSP ":" [FWS] obs-unstructured CRLF
> etc.
[...]
> That is why separate unstructured and obs-unstructured productions
> are required.
[...]
> unstructured     =   [[FWS] *(utext FWS) utext]
> 
> obs-unstructured =   *(utext FWS) [utext]

These could be partially unified as in 2822 as follows:

comments           =  "Comments" ":" unstructured CRLF
obs-comments       =  "Comments" *WSP ';' unstructured CRLF
unstructured       =  [[FWS] *(utext FWS) utext] / obs-unstructured
obs-unstructured   =  [FWS] *(utext FWS) [utext]

However, I think it's clearer to keep the obs- constructs out of
the non-obs productions. Indeed, it becomes clearer still if that
is pursued to its logical extreme:

comments           =  "Comments" ":" unstructured CRLF
obs-comments       =  "Comments" *WSP ":" obs-unstructured CRLF
(Continue reading)


Gmane