Yangwoo Ko | 3 Jan 2011 09:09
Picon

Re: Random thought on ASCII range

Do you mean that "Basic Latin" is for a set of characters and "ASCII"
for encoding of them in 7bit?

On Sat, Jan 1, 2011 at 3:32 AM, Shawn Steele <Shawn.Steele <at> microsoft.com> wrote:
> Random thought:  Unicode calls 00-7F “C0 Controls and Basic Latin.”  Given
> that the C0 Controls are pretty much illegal in email addresses, we could
> shorten that to “Basic Latin”, which isn’t terribly wordy, and evades the
> question of whether we mean the ASCII set of characters U+0000-U+007f or the
> encoding.
>
>
>
> - Shawn
>
>
>
>  
>
> http://blogs.msdn.com/shawnste
>
> (Selfhost 7903)
>
>
>
> _______________________________________________
> IMA mailing list
> IMA <at> ietf.org
> https://www.ietf.org/mailman/listinfo/ima
>
>
(Continue reading)

Charles Lindsey | 3 Jan 2011 11:21
Picon
Picon

Re: non-EAI messages

On Fri, 31 Dec 2010 21:01:55 -0000, Troy Starr <eai <at> troystarr.net> wrote:

> Hi Shawn -
>
>> > The reason it's a terrible idea is that only some subset of the  
>> messages a given MTA handles will be EAI messages.
>>
>> I'm not sure how this is interesting?
>
> Assuming that all of the future MTAs in the message delivery chain  
> support UTF8SMTPbis, then I agree, it's not interesting.  But what  
> happens if the previous MUA/MSA/MTA applied a UTF8SMTPbis flag to the  
> entire session rather than on a per-message basis, then the current MTA  
> wants to deliver one of those messages to an MTA which doesn't support  
> UTF8SMTPbis?  I see the following possibilities:

But surely it is obvious, and certainly so after your examples, that is  
there is to be a UTF8SMTPbis flag sent to a server by a client, then it  
HAS to be a per message flag rather than a session flag? Where did this  
idea of a session flag come from?

--

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131                       
   Web: http://www.cs.man.ac.uk/~chl
Email: chl <at> clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
Dave CROCKER | 3 Jan 2011 16:10

Re: Repeating normative text from other specifications


On 12/31/2010 10:39 AM, Shawn Steele wrote:
> I would also rather only list the differences, and not repeat stuff.  That
> lends to getting out-of-sync.  Also it might make reviewing simpler since if
> it wasn't there we couldn't mess it up :)  Also we wouldn't have to worry
> about other extensions that may impact the original RFC, or updates to that
> RFC.

Exactly.

d/

--

-- 

   Dave Crocker
   Brandenburg InternetWorking
   bbiw.net
Dave CROCKER | 3 Jan 2011 16:23

Re: Random thought on ASCII range


On 1/3/2011 12:09 AM, Yangwoo Ko wrote:
> Do you mean that "Basic Latin" is for a set of characters and "ASCII"
> for encoding of them in 7bit?
>
> On Sat, Jan 1, 2011 at 3:32 AM, Shawn Steele<Shawn.Steele <at> microsoft.com>  wrote:
>> Random thought:  Unicode calls 00-7F “C0 Controls and Basic Latin.”

Kudos to Shawn for being both creative and diligent, in finding such a 
well-founded basis for an additional label to consider.

I had the same question as Yangwoo, but on re-reading Shawn's note I see that 
"Latin" comes from a reference to Unicode.  Therefore I think yes, Latin is the 
set of characters and "ASCII" would be an encoding of those characters.

Hence, the 2x2 table would be:

                  | Representation |  Encoding  |
                  +----------------+------------+
                  |                |            |
           Legacy |     Latin      |   ASCII    |
                  |                |            |
                  +----------------+------------+
                  |                |            |
    International |    Unicode     |   UTF-8    |
                  |                |            |
                  +----------------+------------+

d/
--

-- 
(Continue reading)

John Levine | 3 Jan 2011 16:31

Re: non-EAI messages

> Where did this idea of a session flag come from?

From me, I suppose.

There are clearly two ways to tell whether a message to be relayed
needs an EAI server.  One is what we might call Deep Message
Inspection, scrutinize the message body and envelope to see if they
contain anything that classic SMTP can't handle.  The other is
Assertion, use whatever EAI flag the sender used.

My advice is simply to document the two possibilities and move on.
They both work, for plausible albeit different definitions of work.
Whatever we do, some people will do one, some will do the other.

Regards,
John Levine, johnl <at> iecc.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. http://jl.ly
Dave CROCKER | 3 Jan 2011 16:47

Re: non-EAI messages


On 1/3/2011 7:31 AM, John Levine wrote:
>> Where did this idea of a session flag come from?
>
>> From me, I suppose.
>
> There are clearly two ways to tell whether a message to be relayed
> needs an EAI server.  One is what we might call Deep Message
> Inspection, scrutinize the message body and envelope to see if they
> contain anything that classic SMTP can't handle.  The other is
> Assertion, use whatever EAI flag the sender used.
>
> My advice is simply to document the two possibilities and move on.
> They both work, for plausible albeit different definitions of work.
> Whatever we do, some people will do one, some will do the other.

The characters set of a message is a property of the message.

The encoding of the message is a property of the message.

Consecutive messages could -- and likely will -- have different character sets 
and/or encoding.  A single user speak to many different recipients.  A single 
email environment often supports many different users.  This combination means 
that the transport service needs to impose as few session limitations as possible.

A "session" flag would require establishing a new session in order to change 
character sets and/or encodings.  That makes it a highly sub-optimal choice.

Making character and encoding declarations work at a per-message level, rather 
than a per-session level is by far the better choice.
(Continue reading)

Bill McQuillan | 3 Jan 2011 17:52
Picon
Favicon

Re: non-EAI messages


On Mon, 2011-01-03, Dave CROCKER wrote:

> A "session" flag would require establishing a new session in order to change
> character sets and/or encodings.  That makes it a highly sub-optimal choice.

> Making character and encoding declarations work at a per-message level, rather
> than a per-session level is by far the better choice.

I seems to me that two different client assertions are being discussed
here, "I understand UTF8SMTPbis" and "The current message has non-ASCII in
the header".

Originally the idea was that Deep Message Inspection was able to determine
the value of the second assertion without a flag or command. However the
issue of allowing UTF-8 in responses to VRFY and EXPN raised the issue of
the first assertion in the absence of any message to be inspected.

I fear the only clean solution is both a session state setting command and
a per-transaction flag.

--

-- 
Bill McQuillan <McQuilWP <at> pobox.com>
ned+ima | 3 Jan 2011 18:31

Re: non-EAI messages


> On Mon, 2011-01-03, Dave CROCKER wrote:

> > A "session" flag would require establishing a new session in order to change
> > character sets and/or encodings.  That makes it a highly sub-optimal choice.

> > Making character and encoding declarations work at a per-message level, rather
> > than a per-session level is by far the better choice.

> I seems to me that two different client assertions are being discussed
> here, "I understand UTF8SMTPbis" and "The current message has non-ASCII in
> the header".

Correct, but it is also important to note that they are essentially disjoint in
terms of usage - the former is applicable to VRFY/EXPN, the latter to message
transfer transactions. So these are really two different problems.

> Originally the idea was that Deep Message Inspection was able to determine
> the value of the second assertion without a flag or command.

Which is still true. It's not that deep inspection doesn't work - it does - but
rather that it's not good choice operationally.

> However the
> issue of allowing UTF-8 in responses to VRFY and EXPN raised the issue of
> the first assertion in the absence of any message to be inspected.

> I fear the only clean solution is both a session state setting command and
> a per-transaction flag.

(Continue reading)

John C Klensin | 3 Jan 2011 21:19

Re: Random thought on ASCII range


--On Monday, January 03, 2011 17:09 +0900 Yangwoo Ko
<newcat <at> icu.ac.kr> wrote:

> Do you mean that "Basic Latin" is for a set of characters and
> "ASCII" for encoding of them in 7bit?

Well, if one is going to be completely precise, "Basic Latin"
and "ASCII repertoire" aren't synonyms.   We sort of got away
with it in IDNA because we did a bit of hand waving and had
already excluded all of the inconvenient punctuation, C0 control
characters, etc., from consideration in other ways.  Unicode
gets away with it because they define things that way, but that
isn't the only common usage of that term, it is a local
definition.  We can make it a local definition here too, but
only at the risk of using a term that is used differently
elsewhere in a special way.  I don't have a problem with that if
the WG is ok with it.

But the state of the C0 controls actually is important because
they are valid, if little-used, in 821/822/5321/5322 email.  The
WG has discussed, and I think agreed on, prohibiting them if EAI
features are needed (deliberately vague-- another issue), but
they are still valid in legacy, all-ASCII, addresses.

If one is comparing to Unicode and ignores the ASCII embedding
problem, there is an ASCII repertoire and a Unicode repertoire,
a native encoding for ASCII and three native encodings for
Unicode.  

(Continue reading)

Dave CROCKER | 4 Jan 2011 00:23

"non-ascii"


On 1/3/2011 12:19 PM, John C Klensin wrote:
> However, the most important, and most difficult, term needed to
> make the relevant distinctions is what we've often (and
> sloppily) called non-ASCII.

Why does the current work require a formal term for that (sub)set?

The reference to that subset of Unicode is -- or, rather, should be -- actually 
quite limited.

In the specifications, almost every reference to "non-ascii" actually means 
"Unicode", rather than "the subset of Unicode that is above x07F."  That is, 
they appear to be making the reference to "non-ascii" when that should not be 
what is really meant.

I've posted more than one note asking this question.  Perhaps I missed the 
responses that provide a clear and compelling argument for needing the term?

d/
--

-- 

   Dave Crocker
   Brandenburg InternetWorking
   bbiw.net

Gmane