Paul Smith | 8 May 2008 13:12
Picon
Favicon

header un-folding & encoded headers


I'm just struggling slightly to work out how you're meant to handle the 
following situation. It seems a bit inconsistent, so I want to check I'm 
interpreting it right.

If I have a header

Subject: hello
 there
(no space after 'hello' and one space before 'there')

I should unwrap that to

Subject: hello there

(one space between 'hello' and 'there' - ie just remove the CRLF as 
stated in RFC 2822 2.2.3)
Yes?

----

If I have a header

Subject: =?us-ascii?Q?hello?= =?us-ascii?Q?there?=

(two encoded words with no spaces in them, and a space between them) I 
should decode that to

Subject: hello there

(Continue reading)

Paul Smith | 8 May 2008 14:56
Picon
Favicon

Re: header un-folding & encoded headers


Paul Smith wrote:
>
> I'm just struggling slightly to work out how you're meant to handle 
> the following situation. It seems a bit inconsistent, so I want to 
> check I'm interpreting it right.
Don't worry about this, I've found what I needed hidden away in the 
examples at the end of RFC 2047... Sorry, I should have looked there 
first, but I was looking for a description of the behaviour in the text, 
not just the examples...

Florian Weimer | 8 May 2008 21:46
Picon

Re: C-T-E: base64 and the real world.. what should an MUA do?


* Tony Hansen:

> If there's a padding "=" at the end of the base64, you can stop
> processing right at that point. You *could* also stop processing when
> you run into a non-whitespace character that isn't in the base64
> alphabet, such as the "-" or ":" in your example. It's not perfect.

Actually, the RFC *requires* skipping non-base64 characters.

Tony Hansen | 8 May 2008 22:22
Picon
Favicon

Re: C-T-E: base64 and the real world.. what should an MUA do?


Please re-read the message that started this. It started by asking for 
strategies for dealing with poorly formatted messages where an 
intermediate mail reflector added in a signature at the end such as

	base64 stuff
	--
	this message went through the wringer

The text after the -- looks like base64, but when decoded as base64 
causes strange control characters to become part of the message.

Yes, the RFC requires skipping non-base64 characters. There's never been 
a question about that.

My email pointed out two things:

1) If there's an '=' character, you can *always* stop base64-decoding 
the body part right there. This is by definition of base64.

2) As a heuristic algorithm, instead of blithely skipping over 
non-base64 & non-whitespace characters, you *could* also stop 
base64-decoding at that point. My example was the "--" in the example 
message. However, doing so may cause other issues, your mileage may 
vary, and it's not a perfect solution.

3) If we were to update base64 (and I'm not suggesting that we should), 
it would be worthwhile adding an explicit end of sequence character that 
is *always present*. Right now we have the "=" sometimes present, but it 
isn't always there.
(Continue reading)

Keith Moore | 9 May 2008 05:23

Re: header un-folding & encoded headers


it's simple:

1. white space between two adjacent encoded-words is not displayed.
2. white space between an encoded-word and an ordinary rfc822 atom is 
displayed.

Keith

Paul Smith wrote:
> 
> I'm just struggling slightly to work out how you're meant to handle the 
> following situation. It seems a bit inconsistent, so I want to check I'm 
> interpreting it right.
> 
> If I have a header
> 
> Subject: hello
> there
> (no space after 'hello' and one space before 'there')
> 
> I should unwrap that to
> 
> Subject: hello there
> 
> (one space between 'hello' and 'there' - ie just remove the CRLF as 
> stated in RFC 2822 2.2.3)
> Yes?
> 
> ----
(Continue reading)

Arnt Gulbrandsen | 9 May 2008 12:40
Favicon

Re: C-T-E: base64 and the real world.. what should an MUA do?


Florian Weimer writes:
> Actually, the RFC *requires* skipping non-base64 characters.

That requirement is considerably modified by the sentence after it:

    All line breaks or other characters not
    found in Table 1 must be ignored by decoding software.  In base64
    data, characters other than those in Table 1, line breaks, and other
    white space probably indicate a transmission error, about which a
    warning message or even a message rejection might be appropriate
    under some circumstances.

My code first tries to find the end of the base64 data (as intended by 
the sender, I mean). Only after finding an end does it call the base64 
decoder. It wasn't too difficult to write.

Arnt

Frank Ellermann | 30 May 2008 15:32
Picon
Picon

2822upd <obs-FWS>


Hi, in a 2616bis discussion about the wonders of "#" cum <LWS>
(= "ASCII art with commas") I mentioned how RFC 2822 + 2822upd
solved this by putting all multiple line foldings in <obs-FWS>
and saying that this is obsolete (MUST NOT generate), compare:

<http://tools.ietf.org/html/draft-resnick-2822upd-06#section-4.2> 

| FWS             =   ([*WSP CRLF] 1*WSP) /  obs-FWS
[...]
| obs-FWS         =   1*WSP *(CRLF 1*WSP)

Unless I miss something this could be made clearer with a "2":

| obs-FWS         =   1*WSP 2*(CRLF 1*WSP)
............................^

 Frank


Gmane