Margaret,
As noted, the draft is describing transmission of IP (v4 and v6) over InfiniBand's connected modes. The underlying layer is not TCP/v6.
Your concern about retransmissions across the layers is a valid. Some background:
The IB level MTU maybe be theoretically set as high as 2^31 in the connected modes. However, it will likely be limited to a much smaller size as set by the peers - the method is described in the draft. The data will however be broken down to the physical link MTU (2K or 4K). The packets may be dropped in 'unreliable connected' mode and no retransmission attempted at the IB layer. The reliable connected (RC) mode guarantees in-order delivery. In case of any packet being dropped the RC layer will retransmit. My discussions with the various IBTA folks indicate that they expect the retransmission timouts in milliseconds.
Is a discussion of the possible interaction and the requirement for relevant tuning of the retransmission timers at RC layer sufficient in the draft? Afterall, such issues should be common with any implementation of TCP over a reliable protocol.
Vivek
--
Vivek Kashyap
Linux Technology Center, IBM
vivk <at> us.ibm.com
kashyapv <at> us.ibm.com
Ph: 503 578 3422 T/L: 775 3422
Margaret Wasserman <margaret <at> thingmagic.com>
|
Margaret Wasserman <margaret <at> thingmagic.com> Sent by: ipoverib-bounces <at> ietf.org
12/11/2004 05:54 PM
|
To: <Bill <at> strahm.net> cc: ipoverib <at> ietf.org Subject: RE: [Ipoverib] comments on draft-kashyap-ipoib-connected-mode-02.txt |
Hi Bill,
At 6:53 AM -0800 12/11/04, Bill Strahm wrote:
>OK - a couple of misconceptions
I am sorry for my limited understanding of Infiniband... Hopefully
we all have enough information between us to reach the right answers
here...
>I do not believe there will be a problem however because the timers are
>on such a different scale. Let me try and give an example... There are
>retries in Ethernet (I am talking about collision detection and back
>off) - What if we decided to worry about TCP retransmitting because the
>Layer 2 packet hasn't been able to get on the wire and the TCP timers
>went off and started retransmitting ? People don't worry because the
>timers are SO different - this will be even more pronounced in the IB
>world.
I think I understand your point about timers, but I don't really
understand how this works... Does IB RC actually retransmit lost
packets? Or does it just have some type of collision-detection &
back-off mechanism, like Ethernet? If it retransmits, how can it do
that without waiting for at least one round-trip-time to know that
data has been lost? And, if it doesn't retranmit, how can it
guarantee reliability if data is lost?
Ethernet does not actually retransmit packets... an Ethernet device
will detect a collision during transmission of the packet preamble
and wait a very short period before trying again. This time is much
shorter than the Ethernet round-trip-time, so it is guaranteed (if I
understand this correctly) to be shorter than the smallest possible
TCP retransmission timer (which will continuously decrease if things
are going well until it gets quite close to the round trip time for
the TCP connection).
The fact that Ethernet does not actually retransmit is also
important, as you don't need to worry about retransmissions at the
Ethernet layer exacerbating a congestion condition.
>I don't think we will be able to move the MTU above 64K - just because I
>believe the IP packet size is (was ??? Has the size parameter gone up ?)
>64K so don't worry about someone trying to stick a 2G IP packet onto the
>wire. At the same time if it is possible to specify a single IP packet
>that is 2G, maybe we need some text staying the MTU should be limited to
>something sane
IPv6 includes a Jumbo Payload hop-by-hop option which allows the
transmission of IPv6 packets with payloads of >64K bytes. RFC 2147
(TCP and UDP over IPv6 Jumbograms) defines how to use TCP and UDP
with IPv6 Jumbograms. I don't now if either of these mechanisms is
widely implemented, though.
Margaret
_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib