Re: Interaction of TCP Window Size and BGP Keepalive Behaviour
Shakir, Rob <Rob.Shakir <at> cw.com>
2010-07-06 11:05:51 GMT
Thanks for the responses thus far.
> -----Original Message-----
> From: Paul Jakma [mailto:paul <at> jakma.org]
> Sent: 06 July 2010 09:39
> To: Shakir, Rob
> Cc: idr <at> ietf.org
> Subject: Re: [Idr] Interaction of TCP Window Size and BGP
> Keepalive Behaviour
>> In this case we expect: a) rtr-B does not send any BGP packet
>> (KEEPALIVE/UPDATE/NOTIFICATION) to rtr-A in normal operating
> Why do you think this? Or at least, from what level are you
> considering this?
> The BGP implementation on B should continue to generate BGP messages.
> The TCP implementation on B should not send them yet though, should
This would appear to me to be an internal process of rtr-B. In this
circumstance I find it useful to consider what rtr-A sees. If rtr-B is
queueing packets, rtr-A should not see them, providing the TCP stack
acts as it should be doing. If the window remains zero for a period
greater than the hold time, then rtr-B will not have any opportunity to
send the packet(s) before the session should be torn down unless rtr-A
ignores the hold timer. If rtr-B sends packets then, as long as rtr-A
complies with RFC 793, they should not ever be received by the BGP
process, since only ACK segments should be valid in this circumstance.
The interesting point around the above observation is that, in my
opinion, this appears quite harmful behaviour. In the situation where
there is some transport disruption between rtr-A and rtr-B, if the
packets are generated by the BGP process, but queued by the underlying
TCP implementation, then what is the behaviour when it comes to sending
a NOTIFICATION to indicate that the hold time has expired as rtr-A is
not sending rtr-B KEEPALIVEs? I would expect in this scenario, that I do
not see a session teardown after t=HOLDTIME, but rather only when rtr-A
signals a non-zero window, in which case, we observe:
1) A flood of KEEPALIVEs (the required number for the length of time for
which we saw congestion)
2) A NOTIFICATION after this, since there was transport disruption.
If this were the case, then there is a relatively unknown period of time
(assuming that TCP does not deal with this - which is perhaps
unrealistic), for which the prefixes that we are holding in the RIB due
to this session are of unknown validity.
>> b) rtr-A does not expect any KEEPALIVE/UPDATE packets from rtr-B. The
>> session remains established even if no packet is received in the
> As above, this would be surprising to me. See below.
I believe this is required behaviour, if one is to utilise a TCP window
size of zero to indicate congestion on a BGP session. Another approach
(where some process has a requirement to change the window size to zero)
would result in the transport requiring one type of behaviour from the
peer, yet BGP requiring an incompatible behaviour. This would of course
assume some form of IPC between the TCP stack and the BGP daemon to
avoid this circumstance.
> This justifies B tearing down a session if it does not receive a BGP
> KEEPALIVE. I'm not sure how it justifies that the BGP protocol must be
> able to have special insight and control into what TCP does (i.e. BGP
> being aware that TCP is throttling) - which is what would be required
> for BGP on B to stop generating messages to A, and/or for BGP on A to
> not expect BGP messages from B.
> Given that TCP has its own keepalive mechanism (and there are standard
> APIs for enabling it), and given that the BGP designers chose to also
> have further keepalives at the BGP layer, it seems clear to me that
> the BGP designers intended for BGP KEEPALIVE to measure liveness from
> one BGP stack layer to other - /not/ just the TCP layer. I.e. it seems
> clear there is an intentional layering, and that BGP does not intend
> to be defer its liveness tests to TCP.
This would suggest that then BGP trying to utilise TCP to control
congestion would be some behaviour that goes against this idea. If the
object of KEEPALIVE is to check that the BGP daemon on the other side is
alive, then should it not be the case that we do not use TCP to control
a flow, after all, in a period that we are doing this, then we have
almost required the remote peer not to send data that would indicate
that it's BGP daemon is alive?
I also feel that perhaps I did not articulate myself properly here - one
of the behaviours that I am interested in the validity of is the
inability to send a KEEPALIVE within the hold time resulting in the
session being torn down. I think that the analysis you've presented here
implies that the BGP daemon should not do this, as it has no requirement
to know the state of the TCP session via which it is being transported,
and hence to the knowledge of the BGP daemon, it has been able to
generate KEEPALIVEs successfully. The fact that they have not been
received by the peer is not known to the BGP daemon.
> No such requirement needs be explicitly stated in the BGP RFC. It
> would be somewhat unconventional to have a protocol above TCP specify
> itself to depend very intimately on the current internal state of TCP
> - state which I'm not sure TCP implementations even make available to
> their users (how would you do it, using typical sockets APIs, out of
> curiosity?). I.e it would be a layering violation.
>> I'd very much appreciate comments on whether this behaviour should be
>> expected, and how those implementing a ground-up BGP-4 implementation
>> should treat this scenario?
> It seems expected and normal to me. :)
Hmm, I am not sure that I understand you completely here. If rtr-A tore
down the session after t=HOLDTIME, I'd consider this "expected and
normal" (no keepalives received in hold time, therefore NOTIFICATION is
required), but, in this case, I appear to see that rtr-B sends the
NOTIFICATION because it could not send a KEEPALIVE.
Whilst I see your implementation points - my issue here is that there
seems to be an inherent problem in throttling packets via means of TCP,
when there is a timer tracking packets that may be throttled by this. In
order to avoid this, I would suggest that there is some language in RFC
4271 that explicitly prohibits the use of underlying layers for
controlling the flow of BGP packets, and requires an implementions that
desire such message-pacing behaviour to implement internal queueing of
Apologies if I've missed something obvious in the above. I'm just
unclear as to the justification for what we believe we observe, rather
than either the session remaining established, or rtr-A causing the
session to be torn down.
Many thanks in advance.
Rob Shakir <rob.shakir <at> cw.com>
IP&D Network Designer Cable&Wireless Worldwide
This e-mail has been scanned for viruses by the Cable & Wireless Worldwide e-mail security system - powered
by MessageLabs. For more information on a proactive managed e-mail security service, visit http://www.cw.com/managed-exchange
The information contained in this e-mail is confidential and may also be subject to legal privilege. It is
intended only for the recipient(s) named above. If you are not named above as a recipient, you must not
read, copy, disclose, forward or otherwise use the information contained in this email. If you have
received this e-mail in error, please notify the sender (whose contact details are above) immediately by
reply e-mail and delete the message and any attachments without retaining any copies.
Cable and Wireless Worldwide plc
Registered in England and Wales. Company Number 07029206
Registered office: Liberty House, 76 Hammersmith Road, London W14 8UD, England
Idr mailing list
Idr <at> ietf.org