Bruce Lilly | 13 Dec 18:25
Picon

Re: Interpretation of RFC 2047


Charles Lindsey wrote:

> Would it not a sensible rule be to say that you should decode any occurrence
> of =?<charset>?[BQ]?...?= (subject to the 76 character limit) in any
> header provided:
>     (a) it was immediately preceded by '(' or by CFWS
>     (b) it was immediately followed by ')' or by CFWS
>     (c) it was not contained within a quoted-string

(d) it was not part of a MIME parameter (RFC 2047 expressly forbids 2047
     encoding in MIME parameters; RFC 2231 provides a mechanism for parameters
     and also extends 2047 to include language tags)

... and more (see below)

> Actually, there is a parsing required, because an encoded word in an
> unstructured header must have LWS (i.e. CFWS) on either side of it, whereas
> it can also have '(' and ')' immediately next to it in a strutured header.

That's not accurate: first, LWS and CFWS are different "(a) =?se2?q?x?="
(quotes for legibility only) is legal whereas " (a)=?se2?q?x?=" is not;
both have CFWS immediately before what looks like an encoded-word, but
only the former has LWS immediately before an encoded-word. And there
are many issues with parentheses; ")=?se2?q?x?=(" in a structured
header which contains no other parentheses does not contain an
encoded-word.

Other areas that immediately come to mind are:
1. RFC 2557 Content-Location, which permits URIs, which in turn (RFC 2396)
(Continue reading)

Bruce Lilly | 13 Dec 19:00
Picon

The sad state of MIME non-compliance


Recently I had occasion to send a message consisting of
introductory text plus two pages of content in two formats
(text/plain and application/pdf). The obvious MIME structure
would be

multipart/mixed
    text/plain (introductory text)
    multipart/alternative
       multipart/related
          text/plain (page 1 text)
          text/plain (page 2 text)
       multipart/related
          application/pdf (page 1)
          application/pdf (page 2)

Sadly, many of the MUAs currently on the market fail to display
such a message properly.  Of course, non-MIME text-only UAs will
display the message body.  Kmail 1.4.3 seems to do a reasonable
job of displaying the message correctly.  But Netscape/Mozilla,
MS Outlook [Express], Eudora, and probably others do not. Most
of these display only the introductory text.

A sample message has been submitted with Mozilla bug report
#184869, viewable at http://bugzilla.mozilla.org/show_bug.cgi?id=184869

It appears in most cases that it is the nested MIME multipart
structure that is not being correctly parsed; changing individual
media types seems to have little or no effect.

(Continue reading)

Bruce Lilly | 13 Dec 19:19
Picon

Re: RFC2231 encoding in parameters.


Valdis.Kletnieks <at> vt.edu wrote:

> 1) Are any MUAs "in the wild" currently actually using the 2047-style encoding
> in parameters rather than the 2231 syntax?  If so, who are they, and who wants
> to send the authors a note? ;)

As of a couple of months ago, Netscape/Mozilla did so by default.
There is an obscure, undocumented configuration parameter that
can be placed in the configuration file to change that to use
2231 encoding, but (again, as of a couple of months ago) it did
not work:
1. 8-bit file names were first 2047-encoded, then that encoded via
    the 2231 mechanism
2. instead of a single language tag, a language list was used under
    some circumstances.

I think a bug report was submitted, but a quick search at bugzilla
today didn't show anything relevant.

With the obscure configuration parameter in place, Mozilla 1.2a
produced the following (which, as the filename is us-ascii, does not
show problem #1):

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2a) Gecko/20020910
Content-Type: application/excel;
  name*=ISO-8859-1'en-us, en, fr, ru, ja'membership.xls

Picon

Re: The sad state of MIME non-compliance

On Fri, 2002-12-13 at 19:00, Bruce Lilly wrote:

> multipart/mixed
>     text/plain (introductory text)
>     multipart/alternative
>        multipart/related
>           text/plain (page 1 text)
>           text/plain (page 2 text)
>        multipart/related
>           application/pdf (page 1)
>           application/pdf (page 2)
> 
> Sadly, many of the MUAs currently on the market fail to display
> such a message properly.  Of course, non-MIME text-only UAs will
> display the message body.  Kmail 1.4.3 seems to do a reasonable
> job of displaying the message correctly.  But Netscape/Mozilla,
> MS Outlook [Express], Eudora, and probably others do not. Most
> of these display only the introductory text.

Just to add a data point: evolution (1.2) does fine, displaying the two
pdf files (and not displaying the two text parts).

cheers
-- vbi

--

-- 
this email is protected by a digital signature:  http://fortytwo.ch/gpg

NOTE: keyserver bugs! get my key here: https://fortytwo.ch/gpg/92082481
(Continue reading)

Charles Lindsey | 17 Dec 11:46
Picon
Picon

Re: Interpretation of RFC 2047


In <3DFA180A.7030705 <at> alex.blilly.com> Bruce Lilly <blilly <at> erols.com> writes:

>Charles Lindsey wrote:

>> Would it not a sensible rule be to say that you should decode any occurrence
>> of =?<charset>?[BQ]?...?= (subject to the 76 character limit) in any
>> header provided:
>>     (a) it was immediately preceded by '(' or by CFWS
>>     (b) it was immediately followed by ')' or by CFWS
>>     (c) it was not contained within a quoted-string

>(d) it was not part of a MIME parameter (RFC 2047 expressly forbids 2047
>     encoding in MIME parameters; RFC 2231 provides a mechanism for parameters
>     and also extends 2047 to include language tags)

>... and more (see below)

>> Actually, there is a parsing required, because an encoded word in an
>> unstructured header must have LWS (i.e. CFWS) on either side of it, whereas
>> it can also have '(' and ')' immediately next to it in a strutured header.

I think a reasonable heuristic, which would nearly always do the "right
thing" would be:

NOT to decode anything within properly matched "...", <...> or [...] or
which follows a ';' which looks like the start of some MIME parameters.
And otherwise decode anything enclosed by WS or within properly matched
and nested (...).

(Continue reading)

Bruce Lilly | 17 Dec 19:27
Picon

Re: Interpretation of RFC 2047


Charles Lindsey wrote:

> I think a reasonable heuristic, which would nearly always do the "right
> thing" would be:
> 
> NOT to decode anything within properly matched "...", <...> or [...] or
> which follows a ';' which looks like the start of some MIME parameters.
> And otherwise decode anything enclosed by WS or within properly matched
> and nested (...).

As header field contents are defined by a grammar, attempts to
decode using only regular expressions (as opposed to a parser
which accepts the defined grammar) are doomed to failure. Failures
include both false positives and false negatives, as illustrated
below.

> But there is a more interesting question, which is what agents that create
> unrecognized headers with 8bit stuff in them could usefully do. I.e. a
> user tries to create a Foobar: header with such stuff in it. This could be
> a problem in news to mail gatewaying. Treating all such headers as
> unstructured is possible, but might not do the right thing. Trying to
> recognise comments might be better (not within "...", <...> or [...]
> though).

One cannot recognise a comment unless the header field syntax is known.

    Content-Features: (& (Type="text/plain") (charset=US-ASCII) )

contains no comments.
(Continue reading)

Alan Barrett | 18 Dec 10:57

Re: Interpretation of RFC 2047


On Tue, 17 Dec 2002, Bruce Lilly wrote:
> One cannot recognise a comment unless the header field syntax is known.

One can recognise a comment from lexical analysis alone.  This was true
in RFC 822, and should still be true in RFC 2822 unless something went
wrong.

>    Content-Features: (& (Type="text/plain") (charset=US-ASCII) )
>
> contains no comments.

RFC 822 was absolutely clear that it contains a comment.  By my reading
of RFC 2822 section 3.2.3, it still contains a comment.

RFC 2912 suggests that the above Content-Features header field contains
no comments.  But RFC 2912 was published before RFC 2822, so cannot use
any sophistry about RFC 2822 perhaps having unintentionally changed the
definition of a comment.  Instead, RFC 2912 claims to depend on RFC 822,
where the definition of a comment is absolutely clear, so RFC 2912 would
have had no excuse at all for trying to modify it.

RFC 822 and 2822 did not deliberately leave open the possibility for
future header fields to redefine the comment syntax.  RFC 2912 does not
even discuss the fact that it attempts to redefine the comment syntax.
This is a fatal flaw in RFC 2912, and it's somewhat surprising that it
was not noticed before.

>    Foobar: (& (Type="text/plain") (charset=US-ASCII) )
> 
(Continue reading)

Keith Moore | 18 Dec 15:35
Picon

Re: Interpretation of RFC 2047


> One can recognise a comment from lexical analysis alone.  

comments are only valid in structured fields.  so in order to
recognize a comment you have to know the set of structured fields.
and it is (perhaps unfortunately) the cases that some fields
have a syntax that uses parenthesis as other than comment delimiters.
if I'm not mistaken this has been the case ever since rfc 987,
which used constructs like (a) to order to encode things like @
in PrintableString fields.

> I submit that the RFC 2822 section 3.2.3 definition of a comment was
> intended to apply to all header fields

I don't think so - that would break too many things already in existence.

Keith

Alan Barrett | 18 Dec 16:37

Re: Interpretation of RFC 2047


On Wed, 18 Dec 2002, Keith Moore wrote:
> > One can recognise a comment from lexical analysis alone.  
> 
> comments are only valid in structured fields.  so in order to
> recognize a comment you have to know the set of structured fields.

Yes, that's true (both in RFC 822 section 3.1.3 and RFC 2822 section
2.2.1).

Does anybody claim that that RFC 2912 "Content-Encoding" is an
unstructured field?

> and it is (perhaps unfortunately) the cases that some fields have
> a syntax that uses parenthesis as other than comment delimiters.
> if I'm not mistaken this has been the case ever since rfc 987,
> which used constructs like (a) to order to encode things like @ in
> PrintableString fields.

By my reading of RFC 987, if a PrintableString is used in a context
where something like unquoted "(a)" could be misinterpreted as a
comment, then the entire PrintableString must be further encoded in an
RFC 822 quoted-string.  See the second paragraph on page 58 of RFC 987,
where it says "word may be encoded as 822.atom (which has a restricted
character set) or as 822.quoted-string, which can handle all ASCII
characters."

> > I submit that the RFC 2822 section 3.2.3 definition of a comment was
> > intended to apply to all header fields
>
(Continue reading)

Alan Barrett | 18 Dec 17:08

Re: Interpretation of RFC 2047


On Wed, 18 Dec 2002, Keith Moore wrote:
> > By my reading of RFC 987, if a PrintableString is used in a context
> > where something like unquoted "(a)" could be misinterpreted as a
> > comment, then the entire PrintableString must be further encoded in an
> > RFC 822 quoted-string.  
> 
> perhaps.  but this was not done in practice.

OK.  I managed to avoid encountering such cases.

--apb (Alan Barrett)


Gmane