Claus Färber | 1 Mar 01:00
Picon

Re: gzip-8bit


Keith Moore <moore <at> cs.utk.edu> schrieb/wrote:
> bullfeathers.  the way MIME defines it, it's a transfer encoding.
> it's the means used to convert from canonical form to on-the-wire
> form.  putting compression anywhere else would break MIME.

It's very likely that a new CTE would break MIME, too.

Another idea would be to create an Extended MIME standard (EMIME) that
may be used only within an confined environment. This would allow to add
new ``Extended'' CTEs, new headers such as Content-Encoding (from HTTP),  
and an extension mechanism that allows further extensions.

It could go with an application/message-partial type (similar to  
message/partial but with EMIME content and without the restriction that  
it can't be encoded with base64 or qp), which can be used to encapsulate  
EMIME entities within MIME environments.

Some rough drafts (very rough, and including some material that I don't
consider to be a good idea any longer) can be found at:
http://www.faerber.muc.de/temp/20020404-binary-posts.html

It also includes a draft for a quoted-binary Extended CTE that allows
efficient transfer of binary data over 8BITMIME environments.

Claus
--

-- 
http://www.faerber.muc.de/

(Continue reading)

Keith Moore | 1 Mar 01:06
Picon

Re: Unicode newsgroup name options

> Rejecting bytes 128-255 as an anti-spam technique is idiotic. If any
> noticeable percentage of the Internet started doing this, spammers would
> simply stop using those bytes. (Most of them never even started.)

the number of spammers doing this is not a concern.  a significant percentage
of spam messages are doing this.  and since the messages themselves are
illegally formatted and unlikely to be correctly displayed, it seems perfectly
reasonable to treat them as trash.  it's one of the few effective filtering
techniques with zero false positives.

D. J. Bernstein | 1 Mar 01:30
Picon

Re: Unicode newsgroup name options

Keith Moore writes:
  [ rejecting bytes 128-255 as an anti-spam technique ]
> zero false positives

Wrong. A huge number of non-spam messages use bytes 128-255.

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago

ned+ietf-822 | 1 Mar 01:54

Re: gzip-8bit


> There's a problem hiding in the claims that deflate-8bit and friends can
> be sent over 8BITMIME-supporting channels. Consider a MIA sending a
> message with deflate-8bit body to a SMTP server announcing 8BITMIME.

> If the server isn't prepared to reject unknown charsets in the body of
> messages (and which server is?), it will accept the message and thus
> the responsibility for delivery.

> The next hop doesn't support 8BITMIME. No problem. This is an
> intelligent server that acts by the gateway model of rfc1652 and tries
> to recode the offending body parts. Whooops... bounce. :-(

> Am I missing something?

This is nothing new; nothing requires that servers in possession of an 8bitMIME
message currently be capable of downgrading. Per RFC 1652 section 3, they
already have the option of returning a bounce if the next hop doesn't support
8bitMIME.

It is entirely possible that this encoding will fail to deploy if 8bitMIME
support isn't sufficiently widespread. But it is even more likely that this
will fail to catch on if clients don't add the necessary support.

But we won't know until we try.

				Ned

Keith Moore | 1 Mar 03:40
Picon

Re: Unicode newsgroup name options


>   [ rejecting bytes 128-255 as an anti-spam technique ]
> > zero false positives
> 
> Wrong. A huge number of non-spam messages use bytes 128-255.

I have yet to see a single one.  of course, since I only speak English,
there's a very high probability that any message that I see that contains
those bytes in a message header is spam.  at any rate, they're all in 
violation of the standards, and as far as I'm concerned, it's reasonable to
outright reject any message that is so flagrantly broken.

Bruce Lilly | 1 Mar 06:22
Picon

Re: gzip-8bit


ned+ietf-822 <at> mrochek.com wrote:

> Very unlikely, actually. CTE has always been extensible and has always allowed
> X- tokens. The rules for handling unknown CTEs are also well defined.

Actually, there is a problem; the rule seems to be the following
from RFC 2045, section 6.4:

    Any entity with an unrecognized Content-Transfer-Encoding must be
    treated as if it has a Content-Type of "application/octet-stream",
    regardless of what the Content-Type header field actually says.

That basically means that the content is treated as opaque. Obviously,
one cannot decode an unknown encoding, so that part of the CTE is
moot.  But the other part, viz. the content domain (7bit, 8bit, or
binary) is unspecified.  So an MTA presented with a message with
unknown CTE cannot tell (based on MIME header fields) whether or
not the next stage of transfer requires 8BITMIME or binary transport
or plain old 7bit transport.

I suppose the MTA in question could peek inside the message body to
determine the domain. Unless of course it's not all available (e.g.
it's in a stream too large to be held in memory) -- oops.

If compression is to be considered part of the CTE, then it is
certainly conceivable that some CTE may have binary domain, and
lacking explicit information an assumption of binary domain is
safe (it won't result in foisting incompatible content on the
next hop).  OTOH if compression is considered a separate
(Continue reading)

ned+ietf-822 | 1 Mar 17:31

Re: gzip-8bit


> ned+ietf-822 <at> mrochek.com wrote:

> > Very unlikely, actually. CTE has always been extensible and has always allowed
> > X- tokens. The rules for handling unknown CTEs are also well defined.

> Actually, there is a problem; the rule seems to be the following
> from RFC 2045, section 6.4:

>     Any entity with an unrecognized Content-Transfer-Encoding must be
>     treated as if it has a Content-Type of "application/octet-stream",
>     regardless of what the Content-Type header field actually says.

> That basically means that the content is treated as opaque. Obviously,
> one cannot decode an unknown encoding, so that part of the CTE is
> moot.  But the other part, viz. the content domain (7bit, 8bit, or
> binary) is unspecified.  So an MTA presented with a message with
> unknown CTE cannot tell (based on MIME header fields) whether or
> not the next stage of transfer requires 8BITMIME or binary transport
> or plain old 7bit transport.

First of all, there is normally supposed to be no need for a transport MTA to
need to know the domain of the message as a whole or of its various parts. Only
the range of the message as a whole is supposed to matter to the transport. And
the range is determined by the mechanism used to get the message to the MTA in
the first place. That could be conventional SMTP (7bit), 8bitMIME (8bit),
binary SMTP (binary), or some other mechanism that lies outside of the
standards. 

Of course the mechanism may not represent the true state of affairs, e.g.
(Continue reading)

D. J. Bernstein | 1 Mar 19:51
Picon

Re: Unicode newsgroup name options


Keith Moore writes:
> since I only speak English, there's a very high probability that any
> message that I see that contains those bytes in a message header is spam.

Brilliant, Keith. Next you'll be suggesting that MTA authors throw away
Chinese mail as an anti-spam technique with ``zero false positives.''

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago

Marc Mutz | 1 Mar 20:37
Picon
Favicon

Re: gzip-8bit

On Friday 28 February 2003 20:03, Bruce Lilly wrote:
<snip>
> A UA issue:
> Currently, with QP and Base64, binary content can simply
> be Base64 encoded (QP isn't terribly effective for binary
> content, nor is Base64 terribly useful for text which has
> a few 8-bit octets).  So the UA can pretty much do the
> right thing without user interaction or user
> sophistication.

base64 remains the safe bet even with deflate-* added...

>  Of course, a knowledgable user could
> apply compression before attaching.  Unfortunately in
> practice there exists a large variety of compression
> and/or packaging mechanisms (e.g. just the other day I
> recieved an attachment in "Stuffit" format, and I had
> to track down a decoder for that).
>
> With a larger choice for binary attachment encodings, things
> get a bit more complicated.  The UA can't necessarily
> determine whether the MTA will support 8BITMIME (the user
> may be off-line).

The the UA should record the capabilities of the SMTP server for that 
account or default to a 7bit domain CTE.

> Yet a choice must be made among the
> available encodings. Negotiation along the lines
> of RFC 3297 might be useful.  Simply asking the user to
(Continue reading)

Adam M. Costello | 1 Mar 21:28

Re: gzip-8bit


ned+ietf-822 <at> mrochek.com wrote:

> We've been over this many times before.  Compression cannot be a
> separate attribute because current agents assume that once the CTE
> is removed they have the data identified by the content-type in hand
> and no further processing is required.  Adding compression as, say, a
> different header is therefore something that is guaranteed to cause
> massive breakage.  Whereas adding compression through a new CTE is
> something that should not cause problems with standards compliant
> agents.

Except for the 1000:1 expansion risk that could bring down transport
agents that aren't sufficiently paranoid when altering the
transfer-encoding.  And it's a bit gross to have to define N*M transfer
encodings to support N escape mechanisms (quoted-printable, base64,
escaped-8bit) and M compression schemes (gzip, deflate, bzip2, ...).

Here's a way to separate the compression from the escaping in a
backward-compatible way.  I don't claim that this is the best way, but
it's something to consider.

Content-Type: encoded/gzip
Content-Decoded-Type: application/postscript
Content-Disposition: attachment; filename="foo.ps.gz"
Content-Transfer-Encoding: escaped-8bit

This would remove any temptation for transport agents to decompress the
body.

(Continue reading)


Gmane