Margaret Wasserman | 10 Dec 2004 18:19
Favicon

Re: comments on draft-kashyap-ipoib-connected-mode-02.txt


Hi All,

I also have some comments on draft-kashyap-ipoib-connected-mode-02.txt.

I haven't thoroughly reviewed the document, so I may have some 
further comments later, but in my initial read, one major concern 
jumped out at me.

If I understand correctly, the reliable connected mode of IPOIB is 
TCP/IPv6.  Is that correct?

If so, I think that you need to carefully consider the issues 
associated with tunneling TCP inside of TCP.  In particular, I am 
concerned about this section:

        The default MTU of the IPoIB-CM interface is 2044 octets i.e.
         2048 octet IPoIB-link MTU minus the 4 octet encapsulation
         header.

         The connected modes of InfiniBand allow message sizes up to 2^31
         octets.  Therefore, IPoIB-CM can use a much larger MTU for
         unicast communication between any two endpoints. At the same
         time the maximum and/or optimal payload that can be received or
         sent over an InfiniBand connection is dependent on the
         implementation, HCA and the resources configured.

Having a larger MTU at the higher-level TCP layer than at the 
lower-level TCP layer may lead to situations where the upper and 
lower layers' retransmit timers will both fire at the same time, 
(Continue reading)

Michael Krause | 10 Dec 2004 21:54
Picon

Re: comments on draft-kashyap-ipoib-connected-mode-02.txt

At 09:19 AM 12/10/2004, Margaret Wasserman wrote:

Hi All,

I also have some comments on draft-kashyap-ipoib-connected-mode-02.txt.

I haven't thoroughly reviewed the document, so I may have some further comments later, but in my initial read, one major concern jumped out at me.

If I understand correctly, the reliable connected mode of IPOIB is TCP/IPv6.  Is that correct?

No.  It uses InfiniBand RC or UC to communicate IP datagrams (v4 / v6) between connected endnodes.

If so, I think that you need to carefully consider the issues associated with tunneling TCP inside of TCP.  In particular, I am concerned about this section:

       The default MTU of the IPoIB-CM interface is 2044 octets i.e.
        2048 octet IPoIB-link MTU minus the 4 octet encapsulation
        header.

        The connected modes of InfiniBand allow message sizes up to 2^31
        octets.  Therefore, IPoIB-CM can use a much larger MTU for
        unicast communication between any two endpoints. At the same
        time the maximum and/or optimal payload that can be received or
        sent over an InfiniBand connection is dependent on the
        implementation, HCA and the resources configured.

Having a larger MTU at the higher-level TCP layer than at the lower-level TCP layer may lead to situations where the upper and lower layers' retransmit timers will both fire at the same time, causing real problems in the case of congestion.

This may happen because bothTCP layers will use the same algorithm (and maybe the same code) for calculating the retransmission time. When a packet is lost, the lower layer's TCP retransmission timer will fire, causing a retransmission of one lower-layer MTU of data (2048 bytes), AND the upper layer's TCP retransmission timer will also fire, causing a retransmission of one upper-layer MTU of data (up to 2^31 bytes).  The lower layer will not recognize that the upper layer's retransmission is duplicate data, so this will result in the transmission of many 2K packets when a single 2K packet is lost.

We may want to talk to the Transport area to determine if there are other issues associated with tunneling TCP inside of TCP/IP.

Rest is based on a misconception as there is only IB below IP and no tunneling of TCP over TCP occurs.

Mike


Thoughts?

Margaret


_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
Margaret Wasserman | 11 Dec 2004 13:47
Favicon

Re: comments on draft-kashyap-ipoib-connected-mode-02.txt

Michael Krause wrote:
>No.  It uses InfiniBand RC or UC to communicate IP datagrams (v4 / 
>v6) between connected endnodes.

And what is InfiniBand RC, under the covers?

>Rest is based on a misconception as there is only IB below IP and no 
>tunneling of TCP over TCP occurs.

Well, I know that there is another IP(v6) in there, as IB is very 
closely based on IPv6.  Is RC based on TCP? SCTP?  Or did IB define a 
different reliable, connection-oriented IP protocol?  If the latter, 
what type of retransmission and congestion control algorithms are 
used by IB RC?  And, has there been any study regarding how they 
would interact with the retransmission and congestion control 
algorithms of TCP or other reliable upper-layer transports?

Even if this isn't TCP, per se, there may still be issues, but it 
will take more careful research to determine what they are.

Margaret
Bill Strahm | 11 Dec 2004 15:53

RE: comments on draft-kashyap-ipoib-connected-mode-02.txt

OK - a couple of misconceptions

IB Layer 3 is NOT IPv6.  It has this thing that looks ALMOST like an
IPv6 header - but it does routing completely differently.  Infact it
isn't until you do IB Routing that you would even put a GRH (Global
Route Header) that is what ends up looking like a IPv6 header.  If you
are staying on the local IB subnet - you are only required to put a LRH
(Local Route Header) that looks nothing like IPv6.

The reliability mechanism is not TCP therefore.  I will agree with
Michael on this one.  That said - the same problems that you discuss can
happen between the layer 4 IB and the Layer 4 TCP/IP.

I do not believe there will be a problem however because the timers are
on such a different scale.  Let me try and give an example... There are
retries in Ethernet (I am talking about collision detection and back
off) - What if we decided to worry about TCP retransmitting because the
Layer 2 packet hasn't been able to get on the wire and the TCP timers
went off and started retransmitting ?  People don't worry because the
timers are SO different - this will be even more pronounced in the IB
world.

I don't think we will be able to move the MTU above 64K - just because I
believe the IP packet size is (was ??? Has the size parameter gone up ?)
64K so don't worry about someone trying to stick a 2G IP packet onto the
wire.  At the same time if it is possible to specify a single IP packet
that is 2G, maybe we need some text staying the MTU should be limited to
something sane

Bill

-----Original Message-----
From: ipoverib-bounces <at> ietf.org [mailto:ipoverib-bounces <at> ietf.org] On
Behalf Of Margaret Wasserman
Sent: Saturday, December 11, 2004 4:47 AM
To: ipoverib <at> ietf.org
Subject: Re: [Ipoverib] comments on
draft-kashyap-ipoib-connected-mode-02.txt

Michael Krause wrote:
>No.  It uses InfiniBand RC or UC to communicate IP datagrams (v4 / 
>v6) between connected endnodes.

And what is InfiniBand RC, under the covers?

>Rest is based on a misconception as there is only IB below IP and no 
>tunneling of TCP over TCP occurs.

Well, I know that there is another IP(v6) in there, as IB is very 
closely based on IPv6.  Is RC based on TCP? SCTP?  Or did IB define a 
different reliable, connection-oriented IP protocol?  If the latter, 
what type of retransmission and congestion control algorithms are 
used by IB RC?  And, has there been any study regarding how they 
would interact with the retransmission and congestion control 
algorithms of TCP or other reliable upper-layer transports?

Even if this isn't TCP, per se, there may still be issues, but it 
will take more careful research to determine what they are.

Margaret

_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
Margaret Wasserman | 12 Dec 2004 02:54
Favicon

RE: comments on draft-kashyap-ipoib-connected-mode-02.txt


Hi Bill,

At 6:53 AM -0800 12/11/04, Bill Strahm wrote:
>OK - a couple of misconceptions

I am sorry for my limited understanding of Infiniband...  Hopefully 
we all have enough information between us to reach the right answers 
here...

>I do not believe there will be a problem however because the timers are
>on such a different scale.  Let me try and give an example... There are
>retries in Ethernet (I am talking about collision detection and back
>off) - What if we decided to worry about TCP retransmitting because the
>Layer 2 packet hasn't been able to get on the wire and the TCP timers
>went off and started retransmitting ?  People don't worry because the
>timers are SO different - this will be even more pronounced in the IB
>world.

I think I understand your point about timers, but I don't really 
understand how this works...  Does IB RC actually retransmit lost 
packets?  Or does it just have some type of collision-detection & 
back-off mechanism, like Ethernet?  If it retransmits, how can it do 
that without waiting for at least one round-trip-time to know that 
data has been lost?  And, if it doesn't retranmit, how can it 
guarantee reliability if data is lost?

Ethernet does not actually retransmit packets... an Ethernet device 
will detect a collision during transmission of the packet preamble 
and wait a very short period before trying again.  This time is much 
shorter than the Ethernet round-trip-time, so it is guaranteed (if I 
understand this correctly) to be shorter than the smallest possible 
TCP retransmission timer (which will continuously decrease if things 
are going well until it gets quite close to the round trip time for 
the TCP connection).

The fact that Ethernet does not actually retransmit is also 
important, as you don't need to worry about retransmissions at the 
Ethernet layer exacerbating a congestion condition.

>I don't think we will be able to move the MTU above 64K - just because I
>believe the IP packet size is (was ??? Has the size parameter gone up ?)
>64K so don't worry about someone trying to stick a 2G IP packet onto the
>wire.  At the same time if it is possible to specify a single IP packet
>that is 2G, maybe we need some text staying the MTU should be limited to
>something sane

IPv6 includes a Jumbo Payload hop-by-hop option which allows the 
transmission of IPv6 packets with payloads of >64K bytes.  RFC 2147 
(TCP and UDP over IPv6 Jumbograms) defines how to use TCP and UDP 
with IPv6 Jumbograms.  I don't now if either of these mechanisms is 
widely implemented, though.

Margaret
Michael Krause | 13 Dec 2004 18:57
Picon

Re: comments on draft-kashyap-ipoib-connected-mode-02.txt

At 04:42 AM 12/11/2004, Margaret Wasserman wrote:
At 09:19 AM 12/10/2004, Margaret Wasserman wrote:
No.  It uses InfiniBand RC or UC to communicate IP datagrams (v4 / v6) between connected endnodes.

And what is InfiniBand RC, under the covers?

An InfiniBand reliable connection which is an IB transport type.  It isn't TCP or SCTP.


Rest is based on a misconception as there is only IB below IP and no tunneling of TCP over TCP occurs.

Well, I know that there is another IP(v6) in there, as IB is very closely based on IPv6.

I co-developed / led IB and aside from trying to align the addressing to IPv6, it isn't IPv6.  Never was my intention to have it be IPv6.

  Is RC based on TCP? SCTP?

No.

 Or did IB define a different reliable, connection-oriented IP protocol?

It is a connected oriented protocol that is not IP based.

 If the latter, what type of retransmission and congestion control algorithms are used by IB RC?

Retransmission is via a strongly ordered scheme thus there isn't the concept of SACK, etc.  This keeps it rather thin and easy to implement in hardware.  Congestion control was recently added and uses a variation of notification as well as selective back-off algorithms. 

  And, has there been any study regarding how they would interact with the retransmission and congestion control algorithms of TCP or other reliable upper-layer transports?

They are orthogonal as IB only guarantees a packet is delivered to an endnode.  It does not state anything about what occurs above.  So, in the case of IP over IB, if IP drops the datagram for whatever purpose, the associated transport or ULP will take appropriate recovery actions if desired. 


Even if this isn't TCP, per se, there may still be issues, but it will take more careful research to determine what they are.

There are a variety of universities, etc. who are doing such research.  Try Ohio State to see what they have completed to date.

Mike
_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
Michael Krause | 13 Dec 2004 19:07
Picon

RE: comments on draft-kashyap-ipoib-connected-mode-02.txt

At 06:53 AM 12/11/2004, Bill Strahm wrote:
OK - a couple of misconceptions

IB Layer 3 is NOT IPv6.  It has this thing that looks ALMOST like an
IPv6 header - but it does routing completely differently.  Infact it
isn't until you do IB Routing that you would even put a GRH (Global
Route Header) that is what ends up looking like a IPv6 header.  If you
are staying on the local IB subnet - you are only required to put a LRH
(Local Route Header) that looks nothing like IPv6.

The reliability mechanism is not TCP therefore.  I will agree with
Michael on this one.  That said - the same problems that you discuss can
happen between the layer 4 IB and the Layer 4 TCP/IP.

I do not believe there will be a problem however because the timers are
on such a different scale.  Let me try and give an example... There are
retries in Ethernet (I am talking about collision detection and back
off) - What if we decided to worry about TCP retransmitting because the
Layer 2 packet hasn't been able to get on the wire and the TCP timers
went off and started retransmitting ?  People don't worry because the
timers are SO different - this will be even more pronounced in the IB
world.

True.  One can configure the IB timers to be rather small or even infinite.  In practice, the timers will be no more than a couple of seconds worst case but had been envisioned as measured in milli-seconds in practice as the bandwidths combined with the switch latencies are such that large timers do not make a lot of sense.  When we were developing the IB transports, a lot of discussion went into how things change when moving from a Gbps world to N GBps world.  Things like congestion management become more complicated as attempting to detect and adjust injection rates for something that may be momentary burst can cause such oscillations in the performance that one needs to consider two approaches.  Detect and monitor for N events in a given period of time.  If sustained, then examine injection rates or adjust path selection to reduce the number of events.   This is done on more of a global basis as one may not want to limit this on a per endnode basis depending upon what type and priority a given endnode operates.  Or, detect and monitor and adjust on a per endnode basis assuming that all endnodes are of equal priority.  One can also adjust the VL arbitrations, etc. as well to effect change without impacting injection rate simply because of the service rates for the VL and the implicit flow control that occurs which will cause appropriate back-pressure.  The IBTA completed the congestion spec a couple of months ago and it is worth a read if people have interest.


I don't think we will be able to move the MTU above 64K - just because I
believe the IP packet size is (was ??? Has the size parameter gone up ?)
64K so don't worry about someone trying to stick a 2G IP packet onto the
wire.  At the same time if it is possible to specify a single IP packet
that is 2G, maybe we need some text staying the MTU should be limited to
something sane

TCP Jumbo I thought could go to 256K.

Mike


Bill

-----Original Message-----
From: ipoverib-bounces <at> ietf.org [mailto:ipoverib-bounces <at> ietf.org] On
Behalf Of Margaret Wasserman
Sent: Saturday, December 11, 2004 4:47 AM
To: ipoverib <at> ietf.org
Subject: Re: [Ipoverib] comments on
draft-kashyap-ipoib-connected-mode-02.txt

Michael Krause wrote:
>No.  It uses InfiniBand RC or UC to communicate IP datagrams (v4 /
>v6) between connected endnodes.

And what is InfiniBand RC, under the covers?

>Rest is based on a misconception as there is only IB below IP and no
>tunneling of TCP over TCP occurs.

Well, I know that there is another IP(v6) in there, as IB is very
closely based on IPv6.  Is RC based on TCP? SCTP?  Or did IB define a
different reliable, connection-oriented IP protocol?  If the latter,
what type of retransmission and congestion control algorithms are
used by IB RC?  And, has there been any study regarding how they
would interact with the retransmission and congestion control
algorithms of TCP or other reliable upper-layer transports?

Even if this isn't TCP, per se, there may still be issues, but it
will take more careful research to determine what they are.

Margaret



_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib


_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
David M. Brean | 12 Dec 2004 21:55
Picon

Re: comments on draft-kashyap-ipoib-connected-mode-02.txt

Hello,

Vivek Kashyap wrote:

snip ...

>
>ok..let me posit what seems to be the summary to me (looking for more comments
>from WG members here). I'm more or less reverting to the earlier version of 
>the draft.
>
>In an IPoIB subnet:
>
>	- Every interface MUST support IPoIB-UD
>
>	- An interface MAY optionally also support IPoIB-CM (one or both)
>		i.e. removing the mutually exclusive restriction on rc/uc
>		Note: IIRC, the same serviceID can be used for both RC/UC.
>		If not then they have to stay mutually exclusive.
>
>	- Interoperability is maintained by all nodes supporting IPoIB-UD. 
>	  Any two interfaces that do not have a connection mode in common will
>	  fall back to IPoIB-UD. 
>
>	- The support of any particular IB mode is indicated by the flags
> 	  in the link layer address. Note: IPoIB-UD is always supported and
>	  hence there are no flags to indicate UD support.
>
>	- An interface completes the IPoIB-UD address resolution and then
>          optionally MAY set up RC/UC connections based on the local support
>	  and received flags.
>
>	- A pure IPoIB-UD implementation ignores the RC/UC flags in link layer
>	  address in received packets. It zeroes them on transmit.
>
>	- Every implementation MUST accept all unicast transmissions received
>	  over any of the IPoIB modes it supports. Multicast/Broadcast by 
>	  their nature will be transmitted and received over the IPoIB-UD only.
>
>	***This implies that an interface MAY transmit/receive a packet
>		over any of RC or UC or UD depending on the modes supported
>	        between the peer IP and itself.***
>		
>	- It is an implementation's decision to connect or retry a connect on
>  	  failure on the CM modes. This decision is independently made per
>	  transmission or reception of a connection request.
>
>	- An implementation MAY make multiple connections to a peer. This
>          is a local decision. So is the decision of the peer to refuse
>          such a connection. 
>	
>	The serviceID, link setup, the link address flags, MTU negotiation etc.
>	are covered in the draft. 
>
>  
>

Wasn't there a suggested change to the alignment of the QPN in the 
service ID?  (See 
http://www1.ietf.org/mail-archive/web/ipoverib/current/msg01158.html).

>	- MTU -- we need to discuss more as below.
>
>  
>
snip ...

>
>The interface MTUs at the peers need not be the same at IP or IB layers.
>
>I agree with the concept of just exchanging the max receive MTU at the IB 
>connection setup. 
>
>  
>
In addition to all the above, there may still be some interest from 
folks in having the specification choose one of the *C transport types 
and a fixed logical MTU.

Also, the I-D mentions ARP and RARP as "protocol types" in the frame 
format.  Seems like these would not apply to connected modes.

-David
Vivek Kashyap | 13 Dec 2004 20:37
Picon
Favicon

RE: comments on draft-kashyap-ipoib-connected-mode-02.txt

Margaret,

    As noted, the draft is describing transmission of IP (v4 and v6) over InfiniBand's connected modes. The underlying layer is not TCP/v6.

    Your concern about retransmissions across the layers is a valid. Some background:

    The IB level MTU maybe be theoretically set as high as 2^31 in the connected modes. However, it will likely be limited to a much smaller size as set by the peers - the method is described in the draft. The data will however be broken down to the physical link MTU (2K or 4K). The packets may be dropped in 'unreliable connected' mode and no retransmission attempted at the IB layer. The reliable connected (RC) mode guarantees in-order delivery. In case of any packet being dropped the RC layer will retransmit. My discussions with the various IBTA folks indicate that they expect the retransmission timouts in milliseconds.

    Is a discussion of the possible interaction and the requirement for relevant tuning of the retransmission timers at RC layer sufficient in the draft? Afterall, such issues should be common with any implementation of TCP over a reliable protocol.

    Vivek
    --
    Vivek Kashyap
    Linux Technology Center, IBM
    vivk <at> us.ibm.com
    kashyapv <at> us.ibm.com
    Ph: 503 578 3422 T/L: 775 3422

    Margaret Wasserman <margaret <at> thingmagic.com>




            Margaret Wasserman <margaret <at> thingmagic.com>
            Sent by: ipoverib-bounces <at> ietf.org

            12/11/2004 05:54 PM



    To: <Bill <at> strahm.net>
    cc: ipoverib <at> ietf.org
    Subject: RE: [Ipoverib] comments on draft-kashyap-ipoib-connected-mode-02.txt



    Hi Bill,

    At 6:53 AM -0800 12/11/04, Bill Strahm wrote:
    >OK - a couple of misconceptions

    I am sorry for my limited understanding of Infiniband... Hopefully
    we all have enough information between us to reach the right answers
    here...

    >I do not believe there will be a problem however because the timers are
    >on such a different scale. Let me try and give an example... There are
    >retries in Ethernet (I am talking about collision detection and back
    >off) - What if we decided to worry about TCP retransmitting because the
    >Layer 2 packet hasn't been able to get on the wire and the TCP timers
    >went off and started retransmitting ? People don't worry because the
    >timers are SO different - this will be even more pronounced in the IB
    >world.

    I think I understand your point about timers, but I don't really
    understand how this works... Does IB RC actually retransmit lost
    packets? Or does it just have some type of collision-detection &
    back-off mechanism, like Ethernet? If it retransmits, how can it do
    that without waiting for at least one round-trip-time to know that
    data has been lost? And, if it doesn't retranmit, how can it
    guarantee reliability if data is lost?

    Ethernet does not actually retransmit packets... an Ethernet device
    will detect a collision during transmission of the packet preamble
    and wait a very short period before trying again. This time is much
    shorter than the Ethernet round-trip-time, so it is guaranteed (if I
    understand this correctly) to be shorter than the smallest possible
    TCP retransmission timer (which will continuously decrease if things
    are going well until it gets quite close to the round trip time for
    the TCP connection).

    The fact that Ethernet does not actually retransmit is also
    important, as you don't need to worry about retransmissions at the
    Ethernet layer exacerbating a congestion condition.

    >I don't think we will be able to move the MTU above 64K - just because I
    >believe the IP packet size is (was ??? Has the size parameter gone up ?)
    >64K so don't worry about someone trying to stick a 2G IP packet onto the
    >wire. At the same time if it is possible to specify a single IP packet
    >that is 2G, maybe we need some text staying the MTU should be limited to
    >something sane

    IPv6 includes a Jumbo Payload hop-by-hop option which allows the
    transmission of IPv6 packets with payloads of >64K bytes. RFC 2147
    (TCP and UDP over IPv6 Jumbograms) defines how to use TCP and UDP
    with IPv6 Jumbograms. I don't now if either of these mechanisms is
    widely implemented, though.

    Margaret

    _______________________________________________
    IPoverIB mailing list
    IPoverIB <at> ietf.org
    https://www1.ietf.org/mailman/listinfo/ipoverib
    _______________________________________________
    IPoverIB mailing list
    IPoverIB <at> ietf.org
    https://www1.ietf.org/mailman/listinfo/ipoverib
    
    Vivek Kashyap | 13 Dec 2004 21:00
    Picon
    Favicon

    Re: comments on draft-kashyap-ipoib-connected-mode-02.txt

    David,

    Addressing your comments here rather than inline since this mailer doesn't do that well:

    1. QPN alingnment:
    Yes, there was a suggestion on QPN alignment. I'll address it in the next draft. Please send in any comments if you have any.

    2. "In addition to all the above, there may still be some interest from
    folks in having the specification choose one of the *C transport types
    and a fixed logical MTU."

    Is it not an implementation issue? One can choose to enable only the desired mode and always use the fixed MTU in all negotiations.

    3. ARP in IPoIB-CM.

    It is still part of the definiton of IPoIB-CM. It is just that it won't be over the Connected modes.We do ARP/RARP over the UD QP.

    Vivek

    --
    Vivek Kashyap
    Linux Technology Center, IBM
    vivk <at> us.ibm.com
    kashyapv <at> us.ibm.com
    Ph: 503 578 3422 T/L: 775 3422

    "David M. Brean" <David.Brean <at> Sun.COM>




            "David M. Brean" <David.Brean <at> Sun.COM>
            Sent by: ipoverib-bounces <at> ietf.org

            12/12/2004 12:55 PM



    To: ipoverib <at> ietf.org
    cc:
    Subject: Re: [Ipoverib] comments on draft-kashyap-ipoib-connected-mode-02.txt


    Hello,

    Vivek Kashyap wrote:

    snip ...

    >
    >ok..let me posit what seems to be the summary to me (looking for more comments
    >from WG members here). I'm more or less reverting to the earlier version of
    >the draft.
    >
    >In an IPoIB subnet:
    >
    > - Every interface MUST support IPoIB-UD
    >
    > - An interface MAY optionally also support IPoIB-CM (one or both)
    > i.e. removing the mutually exclusive restriction on rc/uc
    > Note: IIRC, the same serviceID can be used for both RC/UC.
    > If not then they have to stay mutually exclusive.
    >
    > - Interoperability is maintained by all nodes supporting IPoIB-UD.
    > Any two interfaces that do not have a connection mode in common will
    > fall back to IPoIB-UD.
    >
    > - The support of any particular IB mode is indicated by the flags
    > in the link layer address. Note: IPoIB-UD is always supported and
    > hence there are no flags to indicate UD support.
    >
    > - An interface completes the IPoIB-UD address resolution and then
    > optionally MAY set up RC/UC connections based on the local support
    > and received flags.
    >
    > - A pure IPoIB-UD implementation ignores the RC/UC flags in link layer
    > address in received packets. It zeroes them on transmit.
    >
    > - Every implementation MUST accept all unicast transmissions received
    > over any of the IPoIB modes it supports. Multicast/Broadcast by
    > their nature will be transmitted and received over the IPoIB-UD only.
    >
    > ***This implies that an interface MAY transmit/receive a packet
    > over any of RC or UC or UD depending on the modes supported
    > between the peer IP and itself.***
    >
    > - It is an implementation's decision to connect or retry a connect on
    > failure on the CM modes. This decision is independently made per
    > transmission or reception of a connection request.
    >
    > - An implementation MAY make multiple connections to a peer. This
    > is a local decision. So is the decision of the peer to refuse
    > such a connection.
    >
    > The serviceID, link setup, the link address flags, MTU negotiation etc.
    > are covered in the draft.
    >
    >
    >

    Wasn't there a suggested change to the alignment of the QPN in the
    service ID? (See
    http://www1.ietf.org/mail-archive/web/ipoverib/current/msg01158.html).

    > - MTU -- we need to discuss more as below.
    >
    >
    >
    snip ...

    >
    >The interface MTUs at the peers need not be the same at IP or IB layers.
    >
    >I agree with the concept of just exchanging the max receive MTU at the IB
    >connection setup.
    >
    >
    >
    In addition to all the above, there may still be some interest from
    folks in having the specification choose one of the *C transport types
    and a fixed logical MTU.

    Also, the I-D mentions ARP and RARP as "protocol types" in the frame
    format. Seems like these would not apply to connected modes.

    -David





    _______________________________________________
    IPoverIB mailing list
    IPoverIB <at> ietf.org
    https://www1.ietf.org/mailman/listinfo/ipoverib
    _______________________________________________
    IPoverIB mailing list
    IPoverIB <at> ietf.org
    https://www1.ietf.org/mailman/listinfo/ipoverib
    

    Gmane