BCP proposal: regular expressions for Internet Mail identifiers
Sean Leonard <dev+ietf <at> seantek.com>
2016-03-22 22:54:27 GMT
Greetings IETF-SMTP Gods and Denizens (and dispatch):
Over the winter I worked on a new Internet-Draft that I would like to
propose the IETF adopts: Regular Expressions for Internet Mail. The
draft focuses on two identifiers: email addresses and Message-IDs.
The purpose of this standard (proposed as a Best Current Practice) is to
have *IETF-vetted* expressions that implementers and non-mail standards
authors can plug-and-chug without futzing with trying to interpret 40
years of (occasionally conflicting and arcane) RFCs and implementation
lore. There are many non-mail systems out there (read: nearly every web
app, reservation system, customer database, etc. on Earth) that use or
consume email addresses as identifiers, and their inability to accept
the most obvious valid characters (like "+" or even "-"; I have used
apps that do not even accept "-") is a great source of interoperability
problems. (This document is also relevant to some other threads about
the nature of email address identifiers in security artifacts such as
certificates, PGP keys, and DNS records: anyone who is vouching for an
email address ought to be sure that they are recording something that
actually is a valid email address in the first place.) We should get
this right now, before Unicode/EAI makes interoperability issues 50000x
more expensive to correct.
The document is not meant to modify the mail standards, but merely to
reflect and track them as they are updated over time.
As a first draft, the document is in rough shape and has extensive notes
about issues that came up during R&D but have yet to be addressed.
Significant areas that need adequate treatment include:
1. the impact of Unicode (EAI) on identifiers.
2. handling domain names, which comprise 50% of an email address, but
perhaps 85% of the complexity when Unicode gets involved.
2. "deliverable email address" (complying with the modern SMTP
infrastructure) vs. other kinds of email addresses (Internet Message
Format, historic forms).
3. regular expression engines and grammars (i.e., which grammars to use,
which are widely used and produce uniform results).
4. efficiency of the regular expressions.
5. different expressions for validation (testing), part extraction
(capturing groups), decoding, encoding, and searching through text.
6. test vectors.
Hopefully the adoption of this work as an IETF item, coupled with input
from those with extensive experience
(Thanks to John Levine, Pete Resnick, and others for taking initial
questions and discussion on the topic.)
Discussion welcome. Thanks.
-------- Forwarded Message --------
Subject: New Version Notification for draft-seantek-mail-regexen-00.txt
Date: Mon, 21 Mar 2016 16:55:53 -0700
From: internet-drafts <at> ietf.org
A new version of I-D, draft-seantek-mail-regexen-00.txt
has been successfully submitted by Sean Leonard and posted to the
Title: Regular Expressions for Internet Mail
Document date: 2016-03-21
Group: Individual Submission
Internet Mail identifiers are used ubiquitously throughout computing
systems as building blocks of online identity. Unfortunately,
incomplete understandings of the syntaxes of these identifiers has
led to interoperability problems and poor user experiences. Many
users use specific characters in their addresses that are not
properly accepted on various systems. This document prescribes
normative regular expression (regex) patterns for all Internet-
connected systems to use when validating or parsing Internet Mail
identifiers, with special attention to regular expressions that work
with popular languages and platforms.
ietf-smtp mailing list
ietf-smtp <at> ietf.org