Reject messages on backup mail exchangers when primary MX is online
Evert Mouw <post <at> evert.net>
2013-02-23 17:57:43 GMT
Just an idea... maybe it's silly or not worth investigating, but I
thought it wouldn't hurt to ask for comments.
# Reject messages on backup mail exchangers when primary MX is online
- Cascading? Like: test all MXs with lower preference number.
- Check with [RFC 5321].
- Error messages: correct numbers? Ask for opinions.
Negative replies can be permanent (5xx codes) or transient (4xx codes).
- Discuss in newsgroup.
- Write RFC.
Author: Evert Mouw <post <at> evert.net>
- 2013-02-06 first version
Multiple incoming mail servers (Mail eXchangers; MX) may be configured
for a DNS domain. The MX with the highest priority, that is, the
*lowest* preference number, MUST be contacted ([RFC 5321]) by the
mailserver of another domain if that other mailserver wants to send mail.
The MX with the lowest priority number is therefore called the *primary*
MX. The other MXs are *backup* servers. The backup servers will be used
by the sending party if the primary MX cannot be reached for whatever
reason. In practice, reasons might, for example, be related to network
configuration errors or hardware failure.
Spammers have found that sending spam to both the primary MX and one or
multiple backup MXs often works great to increase the number of spam
messages being delivered. Often, backup MXs are less well configured to
stop SPAM. Even if they are well configured, clever spam messages might
be delivered by both the primary and the backup MXs, resulting in
multiple spam messages for a receiver.
Many mail administrators have responded by removing the backup MXs
alltogether. The added costs of more spam, electricity and maintenance
cost does not rationalize the availability of a backup MX. When the
primary MX is offline, than the mails for the domain will be queued by
the mailservers of the sending parties.
This has major drawbacks. First, it places the costs of the storage and
retries to the sending party while that party is not responsible for the
downtime of the receiving mailserver. Second, when the primary MX is
offline for too long, messages might be lost. Third, messages might be
delayed for a long time, even after the primary MX did come back online.
The proper way to address this issue is to deny the use of a backup MX
when the primary MX is online.
## Bad solutions
Some administrators run a periodic script (e.g. cronjob) on the backup
MX to test if the primary MX is online (e.g. netcat to port 25 of the
primary MX). If the primary MX is offline, they dynamically add a
firewall rule to allow incoming port 25 traffic, or they start the SMTP
daemon. Then the primary becomes online again, they block incoming 25
traffic and flush the queue and stop the SMTP daemon.
This causes "mailserver unreachable" errors on legitimately configured
mailservers when the primary MX is online but not reachable due to some
network error on the side of the sender.
It also is a bad solution for multiple-domain mail exchangers.
## One interesting solution
An alternative implementation uses the ETRN command. The primary MX
sends periodically an ETRN request to the backup MXs. The backup MXs
will only become active when they did not receive such an ETRN request
for a preconfigured period of time (e.g., 5 minutes). See the
## Proposed solution
On an incoming message, a backup MX should contact the primary MX to
determine whether it is online (e.g., using HELO / EHLO). If it is
online, then the backup MX should react with an error message 551,
indicating that the sending party should try the primary MX first.
SMTP Error 551 <domain> Try the primary MX first while it is online;
please try <primary MX>
If the sender continues trying such requests, then optionally the backup
MX should periodically blacklist the sender, e.g. by rejecting it with
an error message 550 571 indicating that the sender has too often tried
to mis-use the backup MX.
SMTP Error 550 5.7.1 You tried me, a backup MX, too often while the
primary MX is online.
Going even further, the offender could be placed on a public blacklist
such as [SpamCop].
### The preference debate
There is a so-called [preference debate] on how a sender should react
when the primary MX is not accepting mail of is offline. Some
implementers interpret the RFCs as stating "immediately try a MX with a
higher preference number, while others interpret it as "wait some time,
and then try a MX with a higher preference number". Also, one could
interpret it as "wait some time, and then try them all again, in order".
Some mail servers will wait to try the backup MX for some time after not
being able to deliver to the primary MX. So the following sequence could
1. Sender tries to contact primary MX and fails; queues message.
- Sender waits for some time.
- Primary MX comes online again.
- Sender tries backup MX.
- Backup MX replies with 551 because the primary MX is back online.
- Sender gives up.
Of course, the sender should not give up.
Some domains use [Nolisting] (Poor Man's Greylisting) and have a dummy
primary MX. Although the claim is made that Nolisting is RFC compliant,
I suggest that it is not in the spirit of the RFCs to list a false
primary MX in the DNS *on purpose*. However, my proposal does not
conflict with Nolisting because when the backup MX tests for
connectivity with the primary MX, it will fail to connect and thus
accept incoming mails. In practice, most MXs listed as backup MX in the
Nolisting approach will behave as if they were a primary MX.
[RFC 5321]: http://tools.ietf.org/html/rfc5321
[RFC 2821]: http://www.ietf.org/rfc/rfc2821.txt
ietf-smtp mailing list
ietf-smtp <at> ietf.org