Keith Moore | 1 Dec 02:19 2003
Picon

Re: First strawman for UTF-8 headers proposal


If you're going to use new field names, you need to include the old 
fields (with ASCII equivalent addresses) also, for compatibility with 
existing mail handling tools.

And if you're going to do that, you might as well encode the UTF-8 
fields somehow, to keep them from causing trouble with existing tools 
(though fewer in number) that barf on currently-illegal input even in 
header fields that they do not use.

Keith

On Friday, November 28, 2003, at 09:08  PM, Adam M. Costello wrote:

> Hence I think it might be a good idea to use new field-names for the
> UTF-8-enabled fields, to reduce the chance of accidentally misleading
> old software.  If there is an algorithmic way to determine the syntax
> of the new-style Foo field given the syntax of the old-style Foo field,
> then there should also be a way to algorithmically associate the name 
> of
> the new-style Foo field with the name Foo.

Keith Moore | 1 Dec 02:35 2003
Picon

Re: First strawman for UTF-8 headers proposal


On Saturday, November 29, 2003, at 03:43  PM, Paul Hoffman / IMC wrote:

> One thing I didn't say in the first message, which I probably should 
> have, is that it is a fairly-obvious extension of 8BITMIME. All the 
> lessons we have learned in the past decade (!) from 8BITMIME should be 
> applied with whatever I propose here.

I'm not sure how much the 8BITMIME experience applies.  At the time 
8BITMIME was adopted, the Internet was much smaller, there were many 
fewer UAs, MTAs, and other mail-handling tools (and thus less variety), 
messages travelled a simpler path (fewer firewalls, spam filters, virus 
checkers, etc.) and the vast majority of Internet users still spoke 
English - though this was quickly changing.

Also, 8BITMIME was a much less drastic change than negotiation of UTF-8 
would be now.  Partially this is because many mail readers in use at 
the time of 8BITMIME introduction were still intended for use with text 
terminals or terminal emulators (so MUAs that copied 8bit text to the 
screen often "did the right thing" even if only by accident, or because 
the user had configured the terminal emulator to use the right 
charset).  Partially this is because MTAs and other intermediaries that 
predated 8BITMIME generally did not look at message bodies - they 
looked only at the headers of messages that transited their systems,.  
Since headers of 8bit MIME messages are still ASCII supporting 8BITMIME 
didn't necessarily require any change to a tool's header-parsing code.

One simple example.  Bernstein and others have pointed out that it's 
easier to parse header fields with address lists from the right to the 
left rather than from the left to the right, because this requires less 
(Continue reading)

Keith Moore | 1 Dec 02:46 2003
Picon

Re: First strawman for UTF-8 headers proposal


On Saturday, November 29, 2003, at 09:28  PM, Adam M. Costello wrote:

> POP and IMAP, for example.  They transfer messages from one agent
> to another, like SMTP does.  Therefore, if SMTP needs a negotiation
> mechanism (UTF-8-HEADERS) to verify that the receiving agent can
> handle the new header format, then POP and IMAP will need an analogous
> negotiation mechanism for the same reason.  Maybe NNTP too, though I'm
> not clear on the relationship between news article headers and mail
> headers.  And any other protocol that transfers mail messages from one
> agent to another.

I was going to point that out, but you beat me to it.

Yes, POP and IMAP servers would need to be able to tell whether their 
clients supported UTF-8 headers on a per-session basis (since a lot of 
people use multiple mail clients) and to perform appropriate 
translation.

Then again, a lot of mail transfer protocols don't have any way to do 
negotiation at all.  Batch mail transmission (like UUCP) is still used 
in some places, and there are a lot of UNIX-style filters in use (like 
procmail) where the mail is piped to, or through, a filter, that don't 
provide any good way of doing such negotiation.

This is part of why I claim that if you're going to make that drastic a 
change to the message format, you need to change the format so much 
that it will be obvious to everyone that it's a completely different 
format that has to be handled with a completely different signal path.  
(Personally I'd prefer a regular, binary format that was designed for 
(Continue reading)

Simon Josefsson | 1 Dec 03:56 2003

Re: First strawman for UTF-8 headers proposal


Keith Moore <moore <at> cs.utk.edu> writes:

> Again, I really don't think having UTF-8 headers puts us much closer
> to a solution to the problem at hand - which is to allow multiple
> representations of addresses in different languages and scripts.  (to
> which I might add -- without significant disruption of the mail
> system).   At best, providing unencoded UTF-8 headers would be
> orthogonal to a solution to the problem - actually I suspect it would
> impede adoption of a solution.

Could you define the problem you are thinking of here, more closely?
Being able to send UTF-8 in headers, after "fixing" SMTP, POP3 etc,
between aware applications, would appear to give me non-ASCII e-mail
addresses (and also get rid of RFC 2047, which is a nice side effect).
If this can be made to work, it would solve my internationalization
needs for e-mail, but it sounds as if it wouldn't satisfy your needs.

You say you want multiple representations of addresses in different
languages and scripts.  Is the "multiple" a goal in itself, that must
be present at the protocol level?  Why do you want to support multiple
scripts?  What is missing from Unicode, that warrant the added
complexities of character set tagging of data?  Applications on
non-Unicode platforms can convert to and from their native encoding.

As for language tagging, I'm not sure I see the benefits from language
tagging e-mail addresses.  They are normally treated by humans as
identifiers.  However, it wouldn't be difficult to add a language tag
to the UTF-8 strings.  I wonder if this is a critical feature though.
There are many human languages that use ASCII or trivial extensions of
(Continue reading)

Adam M. Costello | 1 Dec 05:23 2003

Re: First strawman for UTF-8 headers proposal


Paul's strawman proposal avoids defining an ACE form for local
parts.  I'd like to remind everyone that UTF-8-supporting MTAs and
address-mapping servers are harder to deploy than MUAs, and hence there
would be a price to pay for not defining ACE local parts.

Suppose I'm not familiar with ASCII characters, so you tell me your
non-ASCII address, which I remember and later type into my MUA.  If
there is no ACE form for my MUA to use, then the only way for that
message to find you is if you can associate your domain name with a mail
exchanger that supports non-ASCII directly, or with an address-mapping
server.  Either way, you need to wait for some sort of new server to
appear before you can have a usable non-ASCII address to tell me.  (And
even then, if I'm behind a firewall, I might not have access to those
servers; if I can access only a local SMTP gateway then you'll have to
wait for that to be upgraded.)

If an ACE form is defined, then you can register an IDN and give it
an MX record pointing at any existing mail-hosting service, and it
will just work (assuming I have a new MUA, but that's needed in any
approach).  You don't need to wait for any new servers to appear.

Address-mapping and UTF-8 headers can add value beyond what ACE
provides, but they don't render ACE superfluous, because they don't
match its ease of incremental deployment.

AMC

Keith Moore | 1 Dec 06:26 2003
Picon

Re: First strawman for UTF-8 headers proposal


On Sunday, November 30, 2003, at 09:56  PM, Simon Josefsson wrote:

> Keith Moore <moore <at> cs.utk.edu> writes:
>
>> Again, I really don't think having UTF-8 headers puts us much closer
>> to a solution to the problem at hand - which is to allow multiple
>> representations of addresses in different languages and scripts.  (to
>> which I might add -- without significant disruption of the mail
>> system).   At best, providing unencoded UTF-8 headers would be
>> orthogonal to a solution to the problem - actually I suspect it would
>> impede adoption of a solution.
>
> Could you define the problem you are thinking of here, more closely?

I did so a couple of weeks ago in a thread called "what is the real 
problem?"

> Being able to send UTF-8 in headers, after "fixing" SMTP, POP3 etc,
> between aware applications, would appear to give me non-ASCII e-mail
> addresses (and also get rid of RFC 2047, which is a nice side effect).

Non-ASCII email addresses are worse than useless if you can't 
transcribe them - which means at a minimum being able to display them, 
read them, write them down, and type them back in.  So either you have 
to use different addresses depending on whom you're corresponding with 
(and you need a way to keep track of who can use which address), or you 
need a means for mapping between different addresses for the same 
mailbox.  Without a means for mapping between equivalent addresses, 
non-ASCII addresses would essentially be used only by people who can be 
(Continue reading)

Adam M. Costello | 1 Dec 09:05 2003

quick & dirty alternate addresses


While we're considering new header fields and new address-mapping
servers to deal with the problem of having alternate addresses, we ought
to consider how far we could get with what we already have.

Consider this:

From: "name1a <local1a <at> domain1a> / name1b <local1b <at> domain1b> /
  name1c <local1c <at> domain1c>" <local1a_ACE <at> domain1a_ACE>
To: "name2a <local2a <at> domain2a> / name2b <local2b <at> domain2b>"
  <local2a_ACE <at> domain2a_ACE>,
  "name3a <local3a <at> domain3a> / name3b <local3b <at> domain3b>"
  <local3a_ACE <at> domain3a_ACE>

This is perfectly valid syntax today.  Non-ASCII names and addresses
inside the display-name can be represented as encoded-words, and will
display legibly in MIME-enabled MUAs (even if they are IMA-unaware).
The actual angle-addr is encoded using ACE (if necessary), not
encoded-words, and will display correctly only in an IMAA-enabled
MUA, but even if it's not displayed correctly, the user can still see
the redundant copy inside the display-name.  If one of the alternate
addresses is pure ASCII, that one could be used for the angle-addr in
order to avoid the use of ACE.  (But we still need ACE so that someone
who remembers one of the non-ASCII addresses can later type it in and
have it traverse the SMTP infrastructure.)

In today's MUAs, the user will see all the alternate addresses and can
recognize/remember the one that's easiest and ignore the rest.  When
someone replies to all, or saves an address in an address book, all the
alternates automatically go along for the ride, even if the MUA has no
(Continue reading)

Arnt Gulbrandsen | 1 Dec 10:51 2003
Picon

Re: quick & dirty alternate addresses


"Adam M. Costello" <ietf-imaa.amc+0 <at> nicemice.net.RemoveThisWord>
> 
> While we're considering new header fields and new address-mapping
> servers to deal with the problem of having alternate addresses, we ought
> to consider how far we could get with what we already have.
...
> From: "name1a <local1a <at> domain1a> / name1b <local1b <at> domain1b> /
>   name1c <local1c <at> domain1c>" <local1a_ACE <at> domain1a_ACE>

We don't have that. From my point of view, I could implement it, but it
would be roughly as much work as for the Address-Map list.

I'm jittery about two things, though. 1: What effect do really long
display-parts, say 100 characters, have on crapware? 2: What happens when
crapware simply converts to 8-bit and sends that untagged, without
knowing the New Scheme?

Address-Map seems better, since for that method, that the entire header
field may go away, and that's it. As long as the field is present, it
works, if it has been stripped e.g. by some mailing-list software, things
degrade in a comprehensible manner.

--Arnt

Martin Duerst | 1 Dec 15:41 2003
Picon

Re: quick & dirty alternate addresses


At 08:05 03/12/01 +0000, Adam M. Costello wrote:

>While we're considering new header fields and new address-mapping
>servers to deal with the problem of having alternate addresses, we ought
>to consider how far we could get with what we already have.

Hello Adam,

I won't comment on the details of your proposal. But I think it's
very good to explore the design space around alternate addresses.
While there are important interactions with Paul's UTF-8 header
proposal, both technically as well as in terms of deployment,
what exact scheme we pick is in many ways just a detail.

Up to now, we have the following alternate address schemes
(please tell me if I missed some):

- Keith's alternate address lookup servers
- Adam's alternate addresses in encoded words
- Paul's Address-map: header
- Calculated alternates using some ACE

For Paul's draft, that suggests to split out the method of
getting to alternate addresses at least into a separate
chapter (maybe later even into a separate draft), so that
it's easy to replace without having to rewrite the whole
document.

Regards,    Martin.
(Continue reading)

Steve Hole | 1 Dec 16:59 2003

Re: First strawman for UTF-8 headers proposal


On Sun, 30 Nov 2003 20:19:43 -0500 Keith Moore <moore <at> cs.utk.edu> wrote:

> And if you're going to do that, you might as well encode the UTF-8 
> fields somehow, to keep them from causing trouble with existing tools 
> (though fewer in number) that barf on currently-illegal input even in 
> header fields that they do not use.

This is really the issue with the UTF-8 proposal.   Anything that 
*requires* a new SMTP extension is going to take a LONG TIME to deploy on 
the internet.   You should really seek to keep the solution space in the 
land of the MUA and possibly the delivery agent, as much as possible.

Cheers.

---
Steve Hole
Chief Technology Officer - Billing and Payment Systems
ACI Worldwide
<mailto:holes <at> ACIWorldwide.com>
Phone: 780-424-4922


Gmane