johnsonhammond2 | 27 Apr 2013 18:53
Favicon

Biggest Fake Conference in Computer Science

Biggest Fake Conference in Computer Science

We are researchers from different parts of the world and conducted a study on  
the world’s biggest bogus computer science conference WORLDCOMP 
( http://sites.google.com/site/worlddump1 ) organized by Prof. Hamid Arabnia 
from University of Georgia, USA.

We submitted a fake paper to WORLDCOMP 2011 and again (the same paper 
with a modified title) to WORLDCOMP 2012. This paper had numerous 
fundamental mistakes. Sample statements from that paper include: 

(1). Binary logic is fuzzy logic and vice versa
(2). Pascal developed fuzzy logic
(3). Object oriented languages do not exhibit any polymorphism or inheritance
(4). TCP and IP are synonyms and are part of OSI model 
(5). Distributed systems deal with only one computer
(6). Laptop is an example for a super computer
(7). Operating system is an example for computer hardware

Also, our paper did not express any conceptual meaning.  However, it 
was accepted both the times without any modifications (and without 
any reviews) and we were invited to submit the final paper and a 
payment of $500+ fee to present the paper. We decided to use the 
fee for better purposes than making Prof. Hamid Arabnia (Chairman 
of WORLDCOMP) rich. After that, we received few reminders from 
WORLDCOMP to pay the fee but we never responded. 

We MUST say that you should look at the above website if you have any thoughts 
to submit a paper to WORLDCOMP.  DBLP and other indexing agencies have stopped 
indexing WORLDCOMP’s proceedings since 2011 due to its fakeness. See 
(Continue reading)

Murray S. Kucherawy | 22 Mar 2013 17:54
Picon

Authentication-Results

Colleagues,

(with apologies for the cross-posting if you get more than one copy of this)

As you may have seen already, I'm working on a revision to RFC5451.

A Proposed Standard "bis" effort always benefits from describing extant implementations.  I know about the ones I've written, and about some very public uses of it (Gmail, Yahoo, for example).  If there's anyone in this audience that knows of others, I'd love to hear about it.

Reviews of the update are also welcome:

https://datatracker.ietf.org/doc/draft-kucherawy-rfc5451bis/

Thanks,
-MSK

_______________________________________________
ietf-822 mailing list
ietf-822 <at> ietf.org
https://www.ietf.org/mailman/listinfo/ietf-822
Jan Kundrát | 4 Jan 2013 17:44

Different "replying" modes in a MUA

Hi,
I'm struggling to come up with the best approach for designing the "reply" feature in Trojitá. As far as I
know, there's no single "best practices" document to follow (and many "XYZ considered harmful" ones and
an interesting one about how Evolution works [1]). Here's what I'd like to do:

- Always offer multiple ways to reply. These actions will be represneted by an expandable button (i.e. a
button with an arrow on the right which performs the "reasonable thing" by default but allows you to click
the arrow to see the other actions).

- The rules for picking up the recipients will be the following:

1) If there's a List-Post header and its value is not set to "NO" (or an equivalent) and there's at least one
mailto: URL in there, the "Reply to List" is enabled and listed as the default option. When user selects
this action, all of the Sender, From, Reply-To, Cc and Bcc are ignored.

2) "Reply All" option is always available and will generate a list of recipients using the following rules:

  - All addresses in the message's From and Reply-To will be used in the "To" field
  - All addresses in the message's To will be in the Cc
  - All addresses in the message's Cc will be in the Cc
  - All addresses in the message's Bcc will be in the Bcc

After this is complete, a list will be "sanitized":
  - Duplicate enteries in each To/Cc/Bcc are be removed
  - Addresses already in To are removed from Cc and Bcc
  - Addresses already in Cc are removed from Bcc

The List-Post and the Sender headers are ignored when doing the Reply-All thing.

The "Reply All" is the second candidate for a default (i.e. the default when "Reply List" is not available).

3) "Private Reply" option is always available and produces a message with the following recipient(s) in
the "To" field:

  - If the message contains a Reply-To, each of those which are at the same time *not* listed in any List-Post
addresses are used. If the resulting set is non-empty, the From header is ignored. This means that any
address in the Reply-To which is also listed in the List-Post is silenty ignored and anything not in the
List-Post is used.
  - The Sender header is always ignored.
  - If the recipients list is empty at this point, everything from the From field is used.
  - If the resulting set contains more than one address, the duplicates are eliminated.

I am very interested in hearing what you think about this scheme. I realize that this is a topic with a huge
potential for a good flame, and I suspect there are people who have very different opinions as to what is
best. Despite that, I'll be happy to hear what issues are lurking in the algorithm I describe and what
drawbacks I'm eniterly missing. Please also feel free to point me to any existing discussion as long as
it's different from [1], [2] and [3] which I've read already.

When the discussion settles, I plan to impleemnt the outcome in Trojitá (along with a test case with plenty
of examples matching real-world scenarios). Thanks for your help!

With kind regards,
Jan

[1] http://david.woodhou.se/reply-to-list.html
[2] http://www.unicom.com/pw/reply-to-harmful.html
[3] http://woozle.org/~neale/papers/reply-to-still-harmful.html

--

-- 
Trojitá, a fast Qt IMAP e-mail client -- http://trojita.flaska.net/
_______________________________________________
ietf-822 mailing list
ietf-822 <at> ietf.org
https://www.ietf.org/mailman/listinfo/ietf-822
Alessandro Vesely | 3 Jan 2013 16:20
Picon
Favicon

Detecting message/rfc822 file type

Hi all,

as you certainly know, the file utility can detect a message/rfc822
mime type from a message.  It does so by recognizing the first line of
the file.  However, comparisons are qualified as case sensitive in the
magic file that I attach, except for the Delivered-To and Return-Path
that I just patched adding a "cC".  I'm about to send the patch to the
maintainer of the utility.

I'm not patching the "From:" entry, as I never saw a message starting
with it.  Should that entry be removed?  What other fields actually
happen to be on the top of the header?

For a short legend, the "!:mime" refers to the line just above it, and
the "0" that starts the latter is the offset; "string" implies a
string comparison and the "/t" is for text; whitespace is escaped in
the test string following it, and a message terminates the line.  The
magic file supports many other operations, including regex, search,
indirection, and more.  See the man page.  I found the latest version
(5.11) online at
http://www.dsm.fordham.edu/cgi-bin/man-cgi.pl?topic=magic&ampsect=5

TIA for any contribution.
#------------------------------------------------------------------------------
# $File: mail.news,v 1.20 2011/12/08 12:12:46 rrt Exp $
# mail.news:  file(1) magic for mail and news
#
# Unfortunately, saved netnews also has From line added in some news software.
#0	string		From 		mail text
0	string/t		Relay-Version: 	old news text
!:mime	message/rfc822
0	string/t		#!\ rnews	batched news text
!:mime	message/rfc822
0	string/t		N#!\ rnews	mailed, batched news text
!:mime	message/rfc822
0	string/t		Forward\ to 	mail forwarding text
!:mime	message/rfc822
0	string/t		Pipe\ to 	mail piping text
!:mime	message/rfc822
0	string/tcC		Delivered-To:	SMTP mail text
!:mime	message/rfc822
0	string/tcC		Return-Path:	SMTP mail text
!:mime	message/rfc822
0	string/t		Path:		news text
!:mime	message/news
0	string/t		Xref:		news text
!:mime	message/news
0	string/t		From:		news or mail text
!:mime	message/rfc822
0	string/t		Article 	saved news text
!:mime	message/news
0	string/t		BABYL		Emacs RMAIL text
0	string/t		Received:	RFC 822 mail text
!:mime	message/rfc822
0	string/t		MIME-Version:	MIME entity text
#0	string/t		Content-	MIME entity text

# TNEF files...
0	lelong		0x223E9F78	Transport Neutral Encapsulation Format

# From: Kevin Sullivan <ksulliva <at> psc.edu>
0	string		*mbx*		MBX mail folder

# From: Simon Matter <simon.matter <at> invoca.ch>
0	string		\241\002\213\015skiplist\ file\0\0\0	Cyrus skiplist DB

# JAM(mbp) Fidonet message area databases
# JHR file
0	string	JAM\0			JAM message area header file
>12	leshort >0			(%d messages)

# Squish Fidonet message area databases
# SQD file (requires at least one message in the area)
# XXX: Weak magic
#256	leshort	0xAFAE4453		Squish message area data file
#>4	leshort	>0			(%d messages)

#0	string		\<!--\ MHonArc		text/html; x-type=mhonarc

# Cyrus: file(1) magic for compiled Cyrus sieve scripts
# URL: http://www.cyrusimap.org/docs/cyrus-imapd/2.4.6/internal/bytecode.php
# URL: http://git.cyrusimap.org/cyrus-imapd/tree/sieve/bytecode.h?h=master
# From: Philipp Hahn <hahn <at> univention.de>

# Compiled Cyrus sieve script
0       string CyrSBytecode     Cyrus sieve bytecode data,
>12     belong =1       version 1, big-endian
>12     lelong =1       version 1, little-endian
>12     belong x        version %d, network-endian
_______________________________________________
ietf-822 mailing list
ietf-822 <at> ietf.org
https://www.ietf.org/mailman/listinfo/ietf-822
Pete Resnick | 28 Nov 2012 00:02

Re: [Editorial Errata Reported] RFC5322 (3400)

[Bcc'ed to ietf-smtp <at> ietf.org; please discuss over on ietf-822 <at> ietf.org.]

Folks,

The following erratum was posted for 5322. I'm inclined to reject it 
since this discussion actually took place during DRUMS (17-18 March 1998 
in a thread with a subject of "Small Clarification to msg-fmt-04" if you 
are inclined to look) and the consensus outcome as far as I could tell 
(as document editor) was that messages without a final CRLF were SMTP's 
problem. However, 5321 (and 2821) 4.1.1.4 says:

    The mail data are terminated by a line containing only a period, that
    is, the character sequence "<CRLF>.<CRLF>", where the first <CRLF> is
    actually the terminator of the previous line (see Section 4.5.2).
    This is the end of mail data indication.  The first <CRLF> of this
    terminating sequence is also the <CRLF> that ends the final line of
    the data (message text) or, if there was no mail data, ends the DATA
    command itself (the "no mail data" case does not conform to this
    specification since it would require that neither the trace header
    fields required by this specification nor the message header section
    required by RFC 5322 [4] be transmitted).  An extra <CRLF> MUST NOT
    be added, as that would cause an empty line to be added to the
    message.  The only exception to this rule would arise if the message
    body were passed to the originating SMTP-sender with a final "line"
    that did not end in <CRLF>; in that case, the originating SMTP system
    MUST either reject the message as invalid or add <CRLF> in order to
    have the receiving SMTP server recognize the "end of data" condition.

Allowing the originating SMTP system to reject the message as invalid 
seems in conflict with 5322 on this point. So my rejecting this erratum 
will simply end us up with an erratum against 5321.

I'm inclined to hear opinions.

pr

> The following errata report has been submitted for RFC5322,
> "Internet Message Format".
>
> --------------------------------------
> You may review the report below and at:
> http://www.rfc-editor.org/errata_search.php?rfc=5322&eid=3400
>
> --------------------------------------
> Type: Editorial
> Reported by: Christoph Anton Mitterer<calestyo <at> scientia.net>
>
> Section: 3.5.
>
> Original Text
> -------------
>     message         =   (fields / obs-fields)
>                         [CRLF body]
>
>     body            =   (*(*998text CRLF) *998text) / obs-body
>
>
> Corrected Text
> --------------
>     message         =   (fields / obs-fields)
>                         [CRLF body]
>
>     body            =   (*(*998text CRLF) *998text) / obs-body
>
> It is RECOMMENDED that message bodies are terminated by CRLF, though this is in principle not necessary
(this does not apply to messages consisting only of a header section, as header fields are always CRLF terminated).
>
> Note however, that when transporting messages via SMTP the last lines of message bodies MUST be
terminated by CRLF as specified int RFC 5321, section 4.1.1.4.
>
> Notes
> -----
> Hi folks.
>
> AFAIU, the definition of body allows message bodies (not header sections) that end without CRLF.
>
> RFC5321 section 4.1.1.4. however states: "The mail data are terminated by a line containing only a
period, that is, the character sequence "<CRLF>.<CRLF>", where the first<CRLF>  is actually the
terminator of the previous line".
>
> So SMTP forbids, what this RFC allows.
> I guess the SMTP RFC can't be changed here and it makes no particular sense to restrict RFC5322 on the other hand.
>
> My suggestion was to add this clarification.
>
> Perhaps a similar one should be added to RFC5321, telling that Internet Messages themselves wouldn't
need the last CRLF.
>
> Instructions:
> -------------
> This errata is currently posted as "Reported". If necessary, please
> use "Reply All" to discuss whether it should be verified or
> rejected. When a decision is reached, the verifying party (IESG)
> can log in to change the status and edit the report, if necessary.
>
> --------------------------------------
> RFC5322 (draft-resnick-2822upd-06)
> --------------------------------------
> Title               : Internet Message Format
> Publication Date    : October 2008
> Author(s)           : P. Resnick, Ed.
> Category            : DRAFT STANDARD
> Source              : IETF - NON WORKING GROUP
> Area                : N/A
> Stream              : IETF
> Verifying Party     : IESG
>    

--

-- 
Pete Resnick<http://www.qualcomm.com/~presnick/>
Qualcomm Technologies, Inc. - +1 (858)651-4478

_______________________________________________
ietf-822 mailing list
ietf-822 <at> ietf.org
https://www.ietf.org/mailman/listinfo/ietf-822

Jan Kundrát | 23 Nov 2012 13:01

The MIME-Version header and comments

Hi,
RFC 2045's ABNF grammar for the MIME-Version header does not contain any reference to CFWS or other tokens
which can expand to allow comments:

	version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT

Despite that, the RFC is pretty clear that any RFC822 comments are to be ignored [1]. Is that an error in the
ABNF grammar, or is there a generic rule for comments somewhere else?

With kind regards,
Jan

[1] http://tools.ietf.org/html/rfc2045#page-10
_______________________________________________
ietf-822 mailing list
ietf-822 <at> ietf.org
https://www.ietf.org/mailman/listinfo/ietf-822

Jan Kundrát | 3 Nov 2012 20:49

Limiting the amount of msg-ids in the References header

Hi,
it seems to me that according to RFC5322 [1], I should never remove the earliest message-ids from the
References header when producing responses. Therefore the References header is supposed to grow
indefinitely when people keep replying to old messages.

Is that really the suggested behavior? Shouldn't I use only something like twenty most recent msg-ids
found in my parent's References?

With kind regards,
Jan

[1] http://tools.ietf.org/html/rfc5322#page-26

--

-- 
Trojita, a fast e-mail client -- http://trojita.flaska.net/

_______________________________________________
ietf-822 mailing list
ietf-822 <at> ietf.org
https://www.ietf.org/mailman/listinfo/ietf-822

John C Klensin | 12 Sep 2012 19:06

Re: [EAI] EAI and ADSP/DMARC


--On Tuesday, September 11, 2012 23:59 +0000 Franck Martin
<fmartin <at> linkedin.com> wrote:

> I'm moving all the threads away from EAI last call, as I think
> I feel better with the current last call documents. I have
> been advised to also post to ietf-822 <at> ietf.org. So apologies,
> if I cross post and you miss a bit of history.

Thanks.  I suspect it should be moved off the EAI list as well.
I won't insist on that (yet), but you should consider it.

> So I read RFC6530 and I'll try to resume my understanding.
> 
> No MTA talking to another MTA will downgrade or upgrade an
> email. If the receiving MTA cannot handle UTF8, then the email
> will be bounced.

That is correct,  with one qualification.   If some MTA in the
system decides that avoiding the risk of blowback is more
important than properly delivering an NDN, the message
disappears.  I mention that in order to stress how little a MUST
means in the presence of what RFC 5321 describes as "operational
necessity".  If people are convinced that what the Standard
specifies is inappropriate or hazardous in their environment,
they will ignore the standard and do what they think is right.
In that context, the only difference between "MUST", "SHOULD"
and "Pretty Please" is whether the Standard looks silly when it
is ignored.  The choice of requirement words won't affect the
decision.

> Now the submitting MUA, will receive the bounce, and the MUA
> or the user may decide to provide an ASCII compatible email
> message, to be transmitted all the way. The RFCs do not seem
> to indicate specific ways to do a downgrade so that an
> International email can be converted into an ascii one and
> sent. It is left to the user may be with some help from its
> MUA to do this work.

Yes.  There is no "do not seem".  It isn't specified because
circumstances and remedies will differ too much depending on
what information is available and where.

> However what I see is the possibility, for the MUA to use the
> group syntax in the From: header and submit that to the MTA to
> deliver to the final MTA.

That MTA (really a Submission Server in today's vocabulary, see
RFC 6409) has to generate a backward-pointing envelope address
from somewhere to put into the MAIL command.  As far as I know,
there are only two types of methods in use: it figures the
address out from the headers of the message that MUA hands it
and it gets the information out of band.  If it has to figure it
out, it is pretty much stuck: the group syntax isn't permitted
in the envelope and there is no plausible transformation from
it.  FWIW, the Submission Server is prohibited (with a MUST)
from injecting an invalid message into the public Internet.  If
the backward-pointing envelope information is transmitted out of
band, I suppose that the MUA could supply group information in
the "From:" header field and a valid (and ASCII) address in the
envelope.  But that would be at least stupid and probably
malicious.  It is hard for me to believe that specific language
banning it --language that goes beyond the "don't do this"
language that already appears in
draft-leiba--5322upd-from-group-04-- would have any effect on
the author of an MUA who wants to do that or on the author of a
Submission Server who want to allow it.

So I think you are getting very excited about a non-problem.

> If my understanding is correct, this is an issue because the
> receiving MTA will not have enough information to provide a
> check using ADSP or DMARC. This case should not be allowed.

To say part of what I think John Levine has been saying, one of
the characteristics of FUSSP proposals in the past has been
that, having invented the FUSSP, not only the IETF will get in
line and change the way email works to accommodate your
solution, but those changes will deployed immediately worldwide
because the FUSSP is so important (see
http://www.rhyolite.com/anti-spam/you-might-be.html if you are
not familiar with it).  Not going to happen -- on the one hand,
there are lots of ways in which a receiving MTA (relay or
delivery server) may not have enough information to usefully
provide the checks you are looking for.  If the particular case
of a group in the "From:" header field is important enough, than
all an ADSP or DMARC procedure needs to do is to identify
messages containing such fields as non-validatable.  It isn't as
if there are no other circumstances that can result in a message
that can't be validated by those techniques.

> That a receiving MTA downgrade the From: into a group syntax
> for the MUA to be able to display the email to the end user,
> is an annoyance in terms of ADSP/DMARC but as mentioned the
> fix is for the end user to upgrade its MUA. ADSP/DMARC would
> have already been applied to the email at this stage, so no
> core functionality would be lost in that transaction. The MTA
> would also have added Authentication-Results: header with the
> necessary information to indicate the result of SPF, DKIM,
> ADSP, DMARC. However this header is not easily visible to the
> end user. The DMARC spec can alert people about this case in
> Security Considerations, i.e. We could live with it.

Ok.

> So in summary, my opinion is that a submitting MUA MUST NOT be
> allowed to use the group syntax when submitting an email to an
> MTA. Corrolary a MTA MUST not accept an email where the From:
> header contains the group syntax and should bounce that email.

See above.  Good luck with that.

> I think this course would keep the security benefits that
> ADSP/DMARC provide to the email environment.
> 
> Did I miss something?

Yes.  See above.

>...

    john

p.s.  I use different subscription addresses on different IETF
mailing lists.  If I were to try to reply to your message
accurately with a message that would be posted by the mailing
list expander to the EAI and ietf-822 or ietf-smtp lists, just
about the only way to do it would be to put multiple addresses
in the "From:" header, one that corresponded to each list.  Yet
another example of where that construction is potentially useful.

_______________________________________________
ietf-822 mailing list
ietf-822 <at> ietf.org
https://www.ietf.org/mailman/listinfo/ietf-822

Franck Martin | 12 Sep 2012 01:59
Picon
Favicon

EAI and ADSP/DMARC

I'm moving all the threads away from EAI last call, as I think I feel better with the current last call documents. I have been advised to also post to ietf-822 <at> ietf.org. So apologies, if I cross post and you miss a bit of history.

So I read RFC6530 and I'll try to resume my understanding.

No MTA talking to another MTA will downgrade or upgrade an email. If the receiving MTA cannot handle UTF8, then the email will be bounced.

Now the submitting MUA, will receive the bounce, and the MUA or the user may decide to provide an ASCII compatible email message, to be transmitted all the way. The RFCs do not seem to indicate specific ways to do a downgrade so that an International email can be converted into an ascii one and sent. It is left to the user may be with some help from its MUA to do this work.

However what I see is the possibility, for the MUA to use the group syntax in the From: header and submit that to the MTA to deliver to the final MTA.

If my understanding is correct, this is an issue because the receiving MTA will not have enough information to provide a check using ADSP or DMARC. This case should not be allowed.

That a receiving MTA downgrade the From: into a group syntax for the MUA to be able to display the email to the end user, is an annoyance in terms of ADSP/DMARC but as mentioned the fix is for the end user to upgrade its MUA. ADSP/DMARC would have already been applied to the email at this stage, so no core functionality would be lost in that transaction. The MTA would also have added Authentication-Results: header with the necessary information to indicate the result of SPF, DKIM, ADSP, DMARC. However this header is not easily visible to the end user. The DMARC spec can alert people about this case in Security Considerations, i.e. We could live with it.

So in summary, my opinion is that a submitting MUA MUST NOT be allowed to use the group syntax when submitting an email to an MTA. Corrolary a MTA MUST not accept an email where the From: header contains the group syntax and should bounce that email.

I think this course would keep the security benefits that ADSP/DMARC provide to the email environment.

Did I miss something?

This opinion is not the opinion of the DMARC group nor my company, etc… this is an individual submission as anything IETF related.

_______________________________________________
ietf-822 mailing list
ietf-822 <at> ietf.org
https://www.ietf.org/mailman/listinfo/ietf-822
Jay Freeman (saurik | 26 Aug 2012 02:30
Gravatar

Re: interpretation of whitespace inside obs-phrase

Timo,

First off, thank you so much for taking the time to reply to my question; when I ran across this question I was
not certain I would be able to find anyone with much domain experience who would take the time to look at this
question. Again: thanks!

----- "Timo Sirainen" <tss <at> iki.fi> wrote:
> On 25.8.2012, at 8.25, Jay Freeman (saurik) wrote:

> >   display-name    =   phrase
> >   phrase          =   1*word / obs-phrase
> >   obs-phrase      =   word *(word / "." / CFWS)
> >   word            =   atom / quoted-string
> >   atom            =   [CFWS] 1*atext [CFWS]
> >   quoted-string   =   [CFWS]
> >                       DQUOTE *([FWS] qcontent) [FWS] DQUOTE
> >                       [CFWS]
> 
> There is no dot-atom above.

Correct. I purposefully only provided a reference to the rules relevant for the expansion of these display
names, and a dot-atom is not a construct you can produce from a display name; those are only relevant when
producing addr-spec and msg-id.

> >   Both atom and dot-atom are interpreted as a single unit,
...
> So the "dot-atom" mentioned here doesn't apply.

Also correct. However, that is the wording of the paragraph from the specification, which I copy and pasted
verbatim: the same section that discusses the semantics of atoms uses the same rules for dot-atoms, which
in this situation should be ignored.

> > In these cases, I would then presume, I would end up with the
> display names |JayFreeman| and |JayR.Freeman|.
> 
> But they don't say that whitespace doesn't matter for the entire
> phrase, only its individual parts. I'm not sure if anything
> specifically requires you to show the whitespace in any specific way
> though, so my parser attempts to use a single space character where
> there originally was whitespace.

This text also, however, doesn't say that the quotation marks "[don't] matter for the entire phrase, only
[the] individual [quoted strings]". In fact, it is the same sentence with the same clause: should they not
have the same semantics?

   Semantically, neither the optional CFWS outside of the quote
   characters nor the quote characters themselves are part of the
   quoted-string; the quoted-string is what is contained between the two
   quote characters.

If we are then deciding that the content outside of the tokens should be considered part of the phrase, I will
argue that we cannot differentiate between removing quotation marks and removing space characters
(which was the behavior of Java's JavaMail).

However, in the normal/common case of |"Jay Freeman (saurik)" <saurik <at> saurik.com>|, it is both common
practice and quite clear that we are supposed to remove the surrounding quotation marks. :( I thereby am
hoping to get a "we intended you to do X".

> So the end result is that you get: 
> 
> 1:|Jay Freeman|
> 2:|Jay R. Freeman|
> 3:|Jay R Freeman|
> 
> And also |Jay.R.Freeman| also produces |Jay.R.Freeman| (I think many
> parsers will add whitespace after dots there).

I went back and re-tested with 4:|Jay"."Free.man| (which now seemed interesting) to see what would
happen, including against the parsers in Dovecot, Cyrus, and Thunderbird. (I also verified that PHP's
imap really does just call out to c-client.)

4:|Jay . Free.man|:
C Dovecot
Python email.utils
PHP mailparse
Ruby TMail

4:|Jay.Free.man|:
C c-client (PHP IMAP)
C Cyrus
C++ Thunderbird
Java MIME4J

4:|Jay"."Free.man|:
Java JavaMail
Java MIME4J (Lenient)

4:|Jay "." Free man|:
C++ mimelib

I must say that I'm still not certain what the correct way to parse these primitives is; as far as I currently
understand, in other contexts (inside of e-mail addresses, for example) the whitespace around atoms
needs to be totally ignored; why not here?

(While I was at it, I also did 5:|"Jay""Freeman"|; I went ahead and built a framework to easily test all of
these different parsers at once. I could also see it interesting to do clients like Outlook and Gmail, but
that would require more automation work. ;P)

5:|Jay Freeman|:
C Dovecot
Python email.utils
PHP mailparse
Ruby TMail

5:|JayFreeman|:
C c-client (PHP IMAP)
C Cyrus
C++ Thunderbird
Java MIME4J
Java MIME4J (Lenient)

5:|Jay""Freeman|:
Java JavaMail

5:|"Jay" "Freeman"|:
C++ mimelib

Sincerely,
Jay Freeman (saurik)
saurik <at> saurik.com
_______________________________________________
ietf-822 mailing list
ietf-822 <at> ietf.org
https://www.ietf.org/mailman/listinfo/ietf-822

Jay Freeman (saurik | 25 Aug 2012 07:25
Gravatar

interpretation of whitespace inside obs-phrase

Hello. I am working on an implementation of an e-mail parsing library, and thereby am getting intimate with RFC5322.

I am consequently trying to understand how to interpret some of the semantics of whitespace while parsing
addresses, and I have come across a specific situation where I have not understood the RFC. I was thereby
hoping that someone may be able to offer their expertise.

(I will now apologize profusely if this is a misuse of this mailing list. I found a couple previous questions
by someone working on a Ruby e-mail library while going through the archives, and thereby figured that it
was at least not entirely frowned upon to ask such questions here.)

In this case, the specific examples I am working with are as follows. The part I am concerned with is the
display name (although depending on the answer to this issue I may be forced to reevalulate other things I
currently believe I understand).

1: |Jay "Freeman"|
2: |Jay R. Freeman|

For references, here are the higher-level (non-character class) rules from RFC5322 that are important
for these two parses.

   display-name    =   phrase
   phrase          =   1*word / obs-phrase
   obs-phrase      =   word *(word / "." / CFWS)
   word            =   atom / quoted-string
   atom            =   [CFWS] 1*atext [CFWS]
   quoted-string   =   [CFWS]
                       DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                       [CFWS]

Now, one problem I run into is that the grammar is somewhat ambiguous with regards to the placement of CFWS,
however, I will use a greedy expansion for the purposes of interpreting these examples.

In the first case, I end up with a phrase, and in the second an obs-phrase. (I have included two different
versions of the second parse, as technically both are greedy but using a different order for the arguments
of the alternation in the obs-phrase rule.)

1: atom:|Jay | quoted-string:|"Freeman"|
2: atom:|Jay | atom:|R| "." atom:| Freeman|
2: atom:|Jay | atom:|R| "." CFWS atom:|Freeman|

At this point, the standard is clear that the whitespace surrounding the atoms and the quotation marks
surrounding the quoted string (in addition to any whitespace outside of those, although this example has
none) are semantically not part of the values.

   Both atom and dot-atom are interpreted as a single unit, comprising
   the string of characters that make it up.  Semantically, the optional
   comments and FWS surrounding the rest of the characters are not part
   of the atom; the atom is only the run of atext characters in an atom,
   or the atext and "." characters in a dot-atom.

   Semantically, neither the optional CFWS outside of the quote
   characters nor the quote characters themselves are part of the
   quoted-string; the quoted-string is what is contained between the two
   quote characters.  As stated earlier, the "\" in any quoted-pair and
   the CRLF in any FWS/CFWS that appears within the quoted-string are
   semantically "invisible" and therefore not part of the quoted-string
   either.

In these cases, I would then presume, I would end up with the display names |JayFreeman| and
|JayR.Freeman|. In the first case I find this perfectly reasonable. In the second case, however, I'm
somewhat confused by the lack of resulting whitespace.

      Note: The "period" (or "full stop") character (".") in obs-phrase
      is not a form that was allowed in earlier versions of this or any
      other specification.  Period (nor any other character from
      specials) was not allowed in phrase because it introduced a
      parsing difficulty distinguishing between phrases and portions of
      an addr-spec (see section 4.4).  It appears here because the
      period character is currently used in many messages in the
      display-name portion of addresses, especially for initials in
      names, and therefore must be interpreted properly.

Given this description, I would have assumed that the purpose of this expansion is to support clients that
don't feel they need to provide quotation marks around names. I feel somewhat vindicated in this
understanding due to RFC5536 2.1.

   o  Articles are conformant if they use the <obs-phrase> construct
      (use of a phrase like "John Q. Public" without the use of quotes,
      see Section 4.1 of [RFC5322]), but agents MUST NOT generate
      productions of such syntax.

However, without the whitespace--which I am required to ignore due to the rules on how atoms are
parsed--this seems to not be a useful obs- exception. Am I fundamentally misunderstanding this
situation? Is whitespace in these contexts actually preserved?

(If whitespace is preserved, how does one handle the whitespace between the display name and the
addr-spec, or whitespace between other random atoms in the specification, or whitespace preceeding the
display name after the "To:"?)

For completeness, I can also come up with a second way to interpret the second example, which is |JayR.
Freeman|, as neither of the above semantics rules indicate that the CFWS in the obs-phrase is to be
semantically ignored, and should thereby become a space.

   Runs of FWS, comment, or CFWS that occur between lexical tokens in a
   structured header field are semantically interpreted as a single
   space character.

That said, I am not certain if these are technically even "lexical tokens in a structured header", as my read
of other sections of the specification indicate that the structured header itself only has a single
token, an addr-list, and inside of that token there must be explicit rules regarding how whitespace is parsed.

Finally, for comparison, I have attempted to parse this using a few implementations to see what they do.
(BTW, if this is interesting, I'm happy to do more work and test actual clients.) I have added an additional
test, 3:|"Jay"  R  Freeman|, to stress multiple spaces.

Java JavaMail:
1:|Jay "Freeman"|
2:|Jay R. Freeman|
3:|"Jay"  R  Freeman|

Java MIME4J:
1:|Jay Freeman|
2:|Jay R. Freeman|
3:|Jay  R  Freeman|
3:|Jay R Freeman| (Lenient)

Python email.utils:
1:|Jay Freeman|
2:|Jay R. Freeman|
3:|Jay R Freeman|

Ruby TMail:
1:|Jay Freeman|
2:|Jay R.Freeman|
3:|Jay R Freeman|

PHP mailparse_rfc822_parse_addresses:
1:|Jay Freeman|
2:|Jay R.Freeman|
3:|Jay R Freeman|

PHP imap_rfc822_parse_adrlist:
1:|Jay Freeman|
2:|Jay R. Freeman|
3:|Jay  R  Freeman|

Of these results, there is actually very little similarity :(. MIME4J, when using its "lenient" parser
returns the same results as Python's e-mail.utils, and Ruby's TMail returns the same results as PHP's
mailparse extension.

(Incidentally, I believe that the reason the PHP imap extension returning different results from the PHP
mailparse extension is that the imap extension is calling out to c-client, whereas the mailparse
extension was coded in-house.)

So, again, if anyone here is willing to help me understand what the correct behavior here is, I would be most
appreciative. ;P

Sincerely,
Jay Freeman (saurik)
saurik <at> saurik.com
_______________________________________________
ietf-822 mailing list
ietf-822 <at> ietf.org
https://www.ietf.org/mailman/listinfo/ietf-822


Gmane