Hal Rosenstock | 16 Nov 2004 22:11
Picon
Favicon

A Couple of IPoIB Questions

Hi,
 
I have a couple of questions relative to IPoIB:
 
1. draft-ietf-ipoib-ip-over-infiniband-07.txt states:
"Every IPoIB interface MUST "FullMember" join the IB multicast group defined by the broadcast-GID."
 
Isn't the broadcast group for IPv4 ? When the IPoIB interface is IPv6 only, does this group still need be joined ?
If not, where do the parameters for any IPv6 groups come from ? I am presuming that this group needs to be joined in
the IPv6 only case. I just want to be sure.
 
2. ALso, what is the latest status of the Vivek's connected mode draft ? Will it be moving forward ?
 
Thanks.
 
-- Hal
_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
Kanoj Sarcar | 16 Nov 2004 22:33
Picon

Re: A Couple of IPoIB Questions

Hal Rosenstock wrote:
> Hi,

Hi,

>  
> I have a couple of questions relative to IPoIB:
>  
> 1. draft-ietf-ipoib-ip-over-infiniband-07.txt states:
> "Every IPoIB interface MUST "FullMember" join the IB multicast group
> defined by the broadcast-GID."
>  
> Isn't the broadcast group for IPv4 ? When the IPoIB interface is IPv6
> only, does this group still need be joined ?
> If not, where do the parameters for any IPv6 groups come from ? I am
> presuming that this group needs to be joined in
> the IPv6 only case. I just want to be sure.

Previously on the WG, we went thru a discussion on this, and the
consensus was that all interfaces (irrespective of ipv4 only, ipv6 only,
or ipv4 and ipv6) MUST join the broadcast-GID and obtain parameters for
all IPv4 and IPv6 groups from this one single broadcast-GID. We further
discussed changing the signature part of the address of the broadcast
group to reflect that it was IPv4 and IPv6 agnostic, but maintained the
IPv4 signature to make it easier for current implementations to make any
required changes to adapt to this rule.

Thanks.

Kanoj

>  
> 2. ALso, what is the latest status of the Vivek's connected mode draft ?
> Will it be moving forward ?
>  
> Thanks.
>  
> -- Hal
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> IPoverIB mailing list
> IPoverIB <at> ietf.org
> https://www1.ietf.org/mailman/listinfo/ipoverib
Vivek Kashyap | 16 Nov 2004 23:31
Picon
Favicon

Re: A Couple of IPoIB Questions


See below in <VK>

Vivek
--
Vivek Kashyap
Linux Technology Center, IBM
vivk <at> us.ibm.com
kashyapv <at> us.ibm.com
Ph: 503 578 3422 T/L: 775 3422



"Hal Rosenstock" <hnrose <at> earthlink.net>
Sent by: ipoverib-bounces <at> ietf.org

11/16/2004 01:11 PM
Please respond to Hal Rosenstock

       
        To:        "IPoverIB" <ipoverib <at> ietf.org>
        cc:        
        Subject:        [Ipoverib] A Couple of IPoIB Questions



Hi,
 
I have a couple of questions relative to IPoIB:
 
1. draft-ietf-ipoib-ip-over-infiniband-07.txt states:
"Every IPoIB interface MUST "FullMember" join the IB multicast group defined by the broadcast-GID."
 
Isn't the broadcast group for IPv4 ? When the IPoIB interface is IPv6 only, does this group still need be joined ?
If not, where do the parameters for any IPv6 groups come from ? I am presuming that this group needs to be joined in
the IPv6 only case. I just want to be sure.
 
<VK> Yes, the broadcast-GID is at the InfiniBand layer and MUST be joined whether you are running at v4 or v6 layer. <VK>

2. ALso, what is the latest status of the Vivek's connected mode draft ? Will it be moving forward ?

<VK> I'll be submitting it as draft-ietf-ipoib-connected-mode-00.txt by the end of the month. There were some interesting suggestions that were made during the IETF WG meeting. Two of the suggestions of consequence are given below. The others we can discuss when the minutes are published (they include some additional requests on clarification on the transmission draft too).

a. The current draft makes the various modes mutually exclusive i.e. RC, UC and UD are not allowed simultaneously in the same IP subnet. The thought is that it is a link characteristic and hence different per connection mode. It was suggested that one be allowed to mix up RC/UC. This goes back to the original suggestion in the first draft which was:

IPoIB-UD must always be supported. Additionally, the interface can also support either both of RC and UC, or one of them. Or neither of them.

b. Another suggestion was to allow multiple connected mode links (i.e. at IB UC/RC level) between peers.

One thought can be 'yes, but user beware': The IB connections are made using the service ID that is derived from the QPN as described in the draft. If a second attempt succeeds then there are two links. It is up to the implementation to either allow or disallow multiple links.

Thoughts?

<VK>


 
Thanks.
 
-- Hal_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib


_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
Michael Krause | 17 Nov 2004 05:08
Picon

Re: A Couple of IPoIB Questions

At 02:31 PM 11/16/2004, Vivek Kashyap wrote:

See below in <VK>

Vivek
--
Vivek Kashyap
Linux Technology Center, IBM
vivk <at> us.ibm.com
kashyapv <at> us.ibm.com
Ph: 503 578 3422 T/L: 775 3422



"Hal Rosenstock" <hnrose <at> earthlink.net>
Sent by: ipoverib-bounces <at> ietf.org

11/16/2004 01:11 PM
Please respond to Hal Rosenstock
       
        To:        "IPoverIB" <ipoverib <at> ietf.org>
        cc:       
        Subject:        [Ipoverib] A Couple of IPoIB Questions



Hi,
 
I have a couple of questions relative to IPoIB:
 
1. draft-ietf-ipoib-ip-over-infiniband-07.txt states:
"Every IPoIB interface MUST "FullMember" join the IB multicast group defined by the broadcast-GID."
 
Isn't the broadcast group for IPv4 ? When the IPoIB interface is IPv6 only, does this group still need be joined ?
If not, where do the parameters for any IPv6 groups come from ? I am presuming that this group needs to be joined in
the IPv6 only case. I just want to be sure.
 
<VK> Yes, the broadcast-GID is at the InfiniBand layer and MUST be joined whether you are running at v4 or v6 layer. <VK>

2. ALso, what is the latest status of the Vivek's connected mode draft ? Will it be moving forward ?

<VK> I'll be submitting it as draft-ietf-ipoib-connected-mode-00.txt by the end of the month. There were some interesting suggestions that were made during the IETF WG meeting. Two of the suggestions of consequence are given below. The others we can discuss when the minutes are published (they include some additional requests on clarification on the transmission draft too).

a. The current draft makes the various modes mutually exclusive i.e. RC, UC and UD are not allowed simultaneously in the same IP subnet. The thought is that it is a link characteristic and hence different per connection mode. It was suggested that one be allowed to mix up RC/UC. This goes back to the original suggestion in the first draft which was:

IPoIB-UD must always be supported. Additionally, the interface can also support either both of RC and UC, or one of them. Or neither of them.

UD MUST always be supported.  I personally don't care whether one does RC or UC but I don't think both are required as a MAY option.  The advantage of RC is the send credit algorithm.  The advantage of UC is the lack of ACK packets.  ACK is noise in the fabric while send credits provide a simple method to maintain bandwidth / injection control on a per flow basis.

I see no problems with supporting both UD and *C on the same subnet; it is rather foolish to attempt to mandate these be on separate subnets.

b. Another suggestion was to allow multiple connected mode links (i.e. at IB UC/RC level) between peers.

One thought can be 'yes, but user beware': The IB connections are made using the service ID that is derived from the QPN as described in the draft. If a second attempt succeeds then there are two links. It is up to the implementation to either allow or disallow multiple links.

Again, this has been suggested in the past (though most who were involved in the original discussions years gone by are likely gone since much of this discussion occurred before the IETF workgroup was established).  There is obvious benefit to supporting multiple RC per endnode pair.  I do not see any technical reason to oppose nor any issue from an interoperability perspective.  There is no reason for a "user beware".  The work is rather straight to do and implement and the benefit to customers, is again, rather obvious when one considers what the IB fabric offers and how connections can be enable flows through multipath as well as transparent fail-over, flow scheduling, mapping of DiffServ to different arbitration / paths, etc.

Mike
_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
Vivek Kashyap | 17 Nov 2004 08:38
Picon
Favicon

Re: A Couple of IPoIB Questions



      Hi,

      I have a couple of questions relative to IPoIB:

      1. draft-ietf-ipoib-ip-over-infiniband-07.txt states:
      "Every IPoIB interface MUST "FullMember" join the IB multicast group defined by the broadcast-GID."

      Isn't the broadcast group for IPv4 ? When the IPoIB interface is IPv6 only, does this group still need be joined ?
      If not, where do the parameters for any IPv6 groups come from ? I am presuming that this group needs to be joined in
      the IPv6 only case. I just want to be sure.

      <VK> Yes, the broadcast-GID is at the InfiniBand layer and MUST be joined whether you are running at v4 or v6 layer. <VK>

      2. ALso, what is the latest status of the Vivek's connected mode draft ? Will it be moving forward ?

      <VK> I'll be submitting it as draft-ietf-ipoib-connected-mode-00.txt by the end of the month. There were some interesting suggestions that were made during the IETF WG meeting. Two of the suggestions of consequence are given below. The others we can discuss when the minutes are published (they include some additional requests on clarification on the transmission draft too).

      a. The current draft makes the various modes mutually exclusive i.e. RC, UC and UD are not allowed simultaneously in the same IP subnet. The thought is that it is a link characteristic and hence different per connection mode. It was suggested that one be allowed to mix up RC/UC. This goes back to the original suggestion in the first draft which was:

      IPoIB-UD must always be supported. Additionally, the interface can also support either both of RC and UC, or one of them. Or neither of them.


UD MUST always be supported.

<VK> That is and has always been the requirement right from the first draft. <VK>

I personally don't care whether one does RC or UC but I don't think both are required as a MAY option. The advantage of RC is the send credit algorithm. The advantage of UC is the lack of ACK packets. ACK is noise in the fabric while send credits provide a simple method to maintain bandwidth / injection control on a per flow basis.

I see no problems with supporting both UD and *C on the same subnet; it is rather foolish to attempt to mandate these be on separate subnets.b
      <VK> As per the connected-mode draft the UD mechanism is *always* required; address resolutoin depends on it.

      The only point of discussion is whether all nodes must support the same link characteristics in the subnet i.e. all are RC (and UD), or all or UC (and UD), or all are UD only. The alternative is to allow all the nodes to be mixed up with some nodes being RC/UD, others UC/UD and a third set UD only and yet others probably supporting all. within the same IP subnet. [Can the same serviceID be used by both RC and UC ?]

      The third alternative is to associating UD only or UD + one of RC or UC on the same interface. In such a case if mismatched/unsupported connected modes are supported by two nodes then the fall back to UD. This option is not too different from UD QP + RC or UC mechanism.

      <VK>

        b. Another suggestion was to allow multiple connected mode links (i.e. at IB UC/RC level) between peers.

        One thought can be 'yes, but user beware': The IB connections are made using the service ID that is derived from the QPN as described in the draft. If a second attempt succeeds then there are two links. It is up to the implementation to either allow or disallow multiple links.

    Again, this has been suggested in the past (though most who were involved in the original discussions years gone by are likely gone since much of this discussion occurred before the IETF workgroup was established).

    <VK> I'm one of the vestiges of those early times along with you and a few others...so we have hope :). <VK>

    There is obvious benefit to supporting multiple RC per endnode pair. I do not see any technical reason to oppose nor any issue from an interoperability perspective. There is no reason for a "user beware".

    <VK> It is not opposed. The 'user beware' is only underscoring that the the peer interface might not support multiple links- it might enforce a limited number of connections (maybe only one) between a pair of GIDs. Similarly, an implementation not wanting to support multiple links MUST take steps to deny multiple requests.

    <VK>

    The work is rather straight to do and implement and the benefit to customers, is again, rather obvious when one considers what the IB fabric offers and how connections can be enable flows through multipath as well as transparent fail-over, flow scheduling, mapping of DiffServ to different arbitration / paths, etc.

    <VK> In addition Large MTU and APM are two of the main reasons why I've been proposing IPoIB-connected mode for so long. In terms of IPoIB itself, except for the Large MTU, the parameters are hidden from it.<VK>

    Mike
    _______________________________________________
    IPoverIB mailing list
    IPoverIB <at> ietf.org
    https://www1.ietf.org/mailman/listinfo/ipoverib
    
    Michael Krause | 18 Nov 2004 01:46
    Picon

    Re: A Couple of IPoIB Questions

    At 11:38 PM 11/16/2004, Vivek Kashyap wrote:



        Hi, I have a couple of questions relative to IPoIB: 1. draft-ietf-ipoib-ip-over-infiniband-07.txt states: "Every IPoIB interface MUST "FullMember" join the IB multicast group defined by the broadcast-GID." Isn't the broadcast group for IPv4 ? When the IPoIB interface is IPv6 only, does this group still need be joined ? If not, where do the parameters for any IPv6 groups come from ? I am presuming that this group needs to be joined in the IPv6 only case. I just want to be sure.
        <VK> Yes, the broadcast-GID is at the InfiniBand layer and MUST be joined whether you are running at v4 or v6 layer. <VK> 2. ALso, what is the latest status of the Vivek's connected mode draft ? Will it be moving forward ? <VK> I'll be submitting it as draft-ietf-ipoib-connected-mode-00.txt by the end of the month. There were some interesting suggestions that were made during the IETF WG meeting. Two of the suggestions of consequence are given below. The others we can discuss when the minutes are published (they include some additional requests on clarification on the transmission draft too). a. The current draft makes the various modes mutually exclusive i.e. RC, UC and UD are not allowed simultaneously in the same IP subnet. The thought is that it is a link characteristic and hence different per connection mode. It was suggested that one be allowed to mix up RC/UC. This goes back to the original suggestion in the first draft which was: IPoIB-UD must always be supported. Additionally, the interface can also support either both of RC and UC, or one of them. Or neither of them.

    UD MUST always be supported.

    <VK> That is and has always been the requirement right from the first draft. <VK>

    I personally don't care whether one does RC or UC but I don't think both are required as a MAY option. The advantage of RC is the send credit algorithm. The advantage of UC is the lack of ACK packets. ACK is noise in the fabric while send credits provide a simple method to maintain bandwidth / injection control on a per flow basis.

    I see no problems with supporting both UD and *C on the same subnet; it is rather foolish to attempt to mandate these be on separate subnets.b
      <VK> As per the connected-mode draft the UD mechanism is *always* required; address resolutoin depends on it.

      The only point of discussion is whether all nodes must support the same link characteristics in the subnet i.e. all are RC (and UD), or all or UC (and UD), or all are UD only.

    Obviously I would oppose such a solution as it creates artificial constraints with little benefit.

      The alternative is to allow all the nodes to be mixed up with some nodes being RC/UD, others UC/UD and a third set UD only and yet others probably supporting all. within the same IP subnet. [Can the same serviceID be used by both RC and UC ?]

      The third alternative is to associating UD only or UD + one of RC or UC on the same interface. In such a case if mismatched/unsupported connected modes are supported by two nodes then the fall back to UD. This option is not too different from UD QP + RC or UC mechanism.

    KISS:

    - UD universal
    - *C opportunistic
            - Local management issue to control what is sent on the *C interface.  No need to specify
            - Advertise whether one or more ports are supported by UD or *C
            - Advertise whether one or more QP are supported by UD or *C
            - Let local management determine policy for what services are mapped where - no need to specify

    This is both an interoperable approach and simple to implement.  There may be some desire to add a policy interface to state preference for specific types of traffic over a given QP.  I would not oppose this but would view this as a separate draft once the basics are worked out.



      <VK>
        b. Another suggestion was to allow multiple connected mode links (i.e. at IB UC/RC level) between peers. One thought can be 'yes, but user beware': The IB connections are made using the service ID that is derived from the QPN as described in the draft. If a second attempt succeeds then there are two links. It is up to the implementation to either allow or disallow multiple links.

    Again, this has been suggested in the past (though most who were involved in the original discussions years gone by are likely gone since much of this discussion occurred before the IETF workgroup was established).

    <VK> I'm one of the vestiges of those early times along with you and a few others...so we have hope :). <VK>

    There is obvious benefit to supporting multiple RC per endnode pair. I do not see any technical reason to oppose nor any issue from an interoperability perspective. There is no reason for a "user beware".

    <VK> It is not opposed. The 'user beware' is only underscoring that the the peer interface might not support multiple links- it might enforce a limited number of connections (maybe only one) between a pair of GIDs. Similarly, an implementation not wanting to support multiple links MUST take steps to deny multiple requests.

    *C requires CM to operate thus it is a local issue whether additional CM operations are accepted or not.  A given requester node may issue N and a given responder may state 0-N as an implementation may limit the number of *C available for IP traffic.


    <VK>

    The work is rather straight to do and implement and the benefit to customers, is again, rather obvious when one considers what the IB fabric offers and how connections can be enable flows through multipath as well as transparent fail-over, flow scheduling, mapping of DiffServ to different arbitration / paths, etc.

    <VK> In addition Large MTU and APM are two of the main reasons why I've been proposing IPoIB-connected mode for so long. In terms of IPoIB itself, except for the Large MTU, the parameters are hidden from it.<VK>

    Mike
    _______________________________________________
    IPoverIB mailing list
    IPoverIB <at> ietf.org
    https://www1.ietf.org/mailman/listinfo/ipoverib
    
    Vivek Kashyap | 18 Nov 2004 07:46
    Picon
    Favicon

    Re: A Couple of IPoIB Questions

    Mike the format is really off in the last mail from you - making it difficult
    to follow.
    
    Other than that let us discuss in the context of the draft. The draft is
    built upon the following:
    
    1. IPoIB-RC and IPoIB-UC are optional.
    2. IPoIB connected mode depends on a UD QP for address resolution and multicast.
    
    As far as I know, there has been an agreement since the earliest connected mode
    draft I posted.
    
    I'd like the WG to give input on the following issues:
    
    3. Where does the UD QP come from?  Choose one of:
    
    a. It is a UD QP that is associated with the interface at startup.
    
    b. It is a UD QP that is shared with IPoIB-UD.
    
    3a is more generic. It can be considered to include the case 3b.  The original
    proposal was limited to 3b.
    
    4. Link characteristics
    
    The broadcast domain for IPoIB-RC/UC is determined exactly as the
    IPoIB-UD case i.e. through the broadcast-GID. A UD as per 3 is used in this
    step.
    
    Do all interfaces in the IPoIB-conneced mode(CM) have the same link
    characteristics? i.e.
    
    a. all are either IPoIB-RC or IPoIB-UC.
    
    	-- There is also a UD QP associated. The UD QP will be either 3a or 3b
    	   based on WG concensus.
    
    	-- All unicast transmission is on the IPoIB mode i.e. RC or UC.
    
    b. all are IPoIB-UD. Additionally they can be one of IPoIB-RC or IPoIB-UC
    or both.
    
    	-- The presence of the flags indicate the type of communication possible.
    	-- The decision of communicating using a specific mode is determined by
    	   the supported modes and the local policy. Note that incompatible
    	   policies imply that the fallback is communication over UD.
    	-- fallback mode of communication is UD
    
    4b adds a lot of flexibility at the expense of a simple decision. 4a. by
    contrast is straightforward.
    
    5. MTU negotiation
    
    	In the private data field of the CM message the desired MTU is
    	included.
    
    	It was suggested during the IPoIB meeting at IETF that it need not be
    	symmetric. That is a good idea. Thus each peer declares the max MTU it
    	prefers
    
    	REQ: <my desired MTU>
    	REP: <my desired MTU>
    	RTU:
    
    6. Multiple connections for the same IP address
    
    	Local decision. Note that the peer might choose to not honour multiple
    	connections.
    
    Vivek
    
    On Wed, 17 Nov 2004, Michael Krause wrote:
    
    > At 11:38 PM 11/16/2004, Vivek Kashyap wrote:
    >
    >
    >
    > >Hi,  I have a couple of questions relative to IPoIB:  1.
    > >draft-ietf-ipoib-ip-over-infiniband-07.txt states: "Every IPoIB interface
    > >MUST "FullMember" join the IB multicast group defined by the
    > >broadcast-GID."  Isn't the broadcast group for IPv4 ? When the IPoIB
    > >interface is IPv6 only, does this group still need be joined ?  If not,
    > >where do the parameters for any IPv6 groups come from ? I am presuming
    > >that this group needs to be joined in  the IPv6 only case. I just want to
    > >be sure.
    > ><VK> Yes, the broadcast-GID is at the InfiniBand layer and MUST be joined
    > >whether you are running at v4 or v6 layer. <VK>  2. ALso, what is the
    > >latest status of the Vivek's connected mode draft ? Will it be moving
    > >forward ?  <VK> I'll be submitting it as
    > >draft-ietf-ipoib-connected-mode-00.txt by the end of the month. There were
    > >some interesting suggestions that were made during the IETF WG meeting.
    > >Two of the suggestions of consequence are given below. The others we can
    > >discuss when the minutes are published (they include some additional
    > >requests on clarification on the transmission draft too).  a. The current
    > >draft makes the various modes mutually exclusive i.e. RC, UC and UD are
    > >not allowed simultaneously in the same IP subnet. The thought is that it
    > >is a link characteristic and hence different per connection mode. It was
    > >suggested that one be allowed to mix up RC/UC. This goes back to the
    > >original suggestion in the first draft which was:  IPoIB-UD must always be
    > >supported. Additionally, the interface can also support either both of RC
    > >and UC, or one of them. Or neither of them.
    > >
    > >UD MUST always be supported.
    > >
    > ><VK> That is and has always been the requirement right from the first
    > >draft. <VK>
    > >
    > >I personally don't care whether one does RC or UC but I don't think both
    > >are required as a MAY option. The advantage of RC is the send credit
    > >algorithm. The advantage of UC is the lack of ACK packets. ACK is noise in
    > >the fabric while send credits provide a simple method to maintain
    > >bandwidth / injection control on a per flow basis.
    > >
    > >I see no problems with supporting both UD and *C on the same subnet; it is
    > >rather foolish to attempt to mandate these be on separate subnets.b
    > ><VK> As per the connected-mode draft the UD mechanism is *always*
    > >required; address resolutoin depends on it.
    > >
    > >The only point of discussion is whether all nodes must support the same
    > >link characteristics in the subnet i.e. all are RC (and UD), or all or UC
    > >(and UD), or all are UD only.
    >
    > Obviously I would oppose such a solution as it creates artificial
    > constraints with little benefit.
    >
    > >The alternative is to allow all the nodes to be mixed up with some nodes
    > >being RC/UD, others UC/UD and a third set UD only and yet others probably
    > >supporting all. within the same IP subnet. [Can the same serviceID be used
    > >by both RC and UC ?]
    > >
    > >The third alternative is to associating UD only or UD + one of RC or UC on
    > >the same interface. In such a case if mismatched/unsupported connected
    > >modes are supported by two nodes then the fall back to UD. This option is
    > >not too different from UD QP + RC or UC mechanism.
    >
    > KISS:
    >
    > - UD universal
    > - *C opportunistic
    >          - Local management issue to control what is sent on the *C
    > interface.  No need to specify
    >          - Advertise whether one or more ports are supported by UD or *C
    >          - Advertise whether one or more QP are supported by UD or *C
    >          - Let local management determine policy for what services are
    > mapped where - no need to specify
    >
    > This is both an interoperable approach and simple to implement.  There may
    > be some desire to add a policy interface to state preference for specific
    > types of traffic over a given QP.  I would not oppose this but would view
    > this as a separate draft once the basics are worked out.
    >
    >
    >
    > ><VK>
    > >b. Another suggestion was to allow multiple connected mode links (i.e. at
    > >IB UC/RC level) between peers.  One thought can be 'yes, but user beware':
    > >The IB connections are made using the service ID that is derived from the
    > >QPN as described in the draft. If a second attempt succeeds then there are
    > >two links. It is up to the implementation to either allow or disallow
    > >multiple links.
    > >
    > >Again, this has been suggested in the past (though most who were involved
    > >in the original discussions years gone by are likely gone since much of
    > >this discussion occurred before the IETF workgroup was established).
    > >
    > ><VK> I'm one of the vestiges of those early times along with you and a few
    > >others...so we have hope :). <VK>
    > >
    > >There is obvious benefit to supporting multiple RC per endnode pair. I do
    > >not see any technical reason to oppose nor any issue from an
    > >interoperability perspective. There is no reason for a "user beware".
    > >
    > ><VK> It is not opposed. The 'user beware' is only underscoring that the
    > >the peer interface might not support multiple links- it might enforce a
    > >limited number of connections (maybe only one) between a pair of GIDs.
    > >Similarly, an implementation not wanting to support multiple links MUST
    > >take steps to deny multiple requests.
    >
    > *C requires CM to operate thus it is a local issue whether additional CM
    > operations are accepted or not.  A given requester node may issue N and a
    > given responder may state 0-N as an implementation may limit the number of
    > *C available for IP traffic.
    >
    >
    > ><VK>
    > >
    > >The work is rather straight to do and implement and the benefit to
    > >customers, is again, rather obvious when one considers what the IB fabric
    > >offers and how connections can be enable flows through multipath as well
    > >as transparent fail-over, flow scheduling, mapping of DiffServ to
    > >different arbitration / paths, etc.
    > >
    > ><VK> In addition Large MTU and APM are two of the main reasons why I've
    > >been proposing IPoIB-connected mode for so long. In terms of IPoIB itself,
    > >except for the Large MTU, the parameters are hidden from it.<VK>
    >
    > Mike
    
    __
    
    Vivek Kashyap
    Linux Technology Center, IBM
    
    Roland Dreier | 18 Nov 2004 15:56

    Re: A Couple of IPoIB Questions

        > Mike the format is really off in the last mail from you -
        > making it difficult to follow.
    
    Vivek, I think that if you used standard quoting in your replies
    instead of your own "<VK>" format, it would be much easier to follow
    email threads involving your replies.
    
    Thanks,
      Roland
    
    Michael Krause | 18 Nov 2004 16:09
    Picon

    Re: A Couple of IPoIB Questions

    At 10:46 PM 11/17/2004, Vivek Kashyap wrote:
    Mike the format is really off in the last mail from you - making it difficult
    to follow.


    Other than that let us discuss in the context of the draft. The draft is
    built upon the following:

    1. IPoIB-RC and IPoIB-UC are optional.

    I would prefer only one be used - either RC or UC.  I've provided some logic for either one as a preference but don't see a reason to have both.  Both just leads to options which leads to interoperability problems.

    2. IPoIB connected mode depends on a UD QP for address resolution and multicast.

    As far as I know, there has been an agreement since the earliest connected mode
    draft I posted.


    I'd like the WG to give input on the following issues:

    3. Where does the UD QP come from?  Choose one of:

    a. It is a UD QP that is associated with the interface at startup.

    b. It is a UD QP that is shared with IPoIB-UD.


    3a is more generic. It can be considered to include the case 3b.  The original
    proposal was limited to 3b.

    From an implementation point of view, all of this will be hidden within the driver below IP.  As such, the driver will maintain the associations.  Currently, each driver "instance" (may be multiple per IB port) will have at least 1 UD QP.  Given the existing protocol already defines how to share this QP with other nodes, why not just re-use it and avoid doing more work?  The driver can then map on a per endnode pair basis what *C QP go with what the UD QP and the spec remains largely silent on how this is accomplished.

    4. Link characteristics

    The broadcast domain for IPoIB-RC/UC is determined exactly as the
    IPoIB-UD case i.e. through the broadcast-GID. A UD as per 3 is used in this
    step.

    Do all interfaces in the IPoIB-conneced mode(CM) have the same link characteristics? i.e.

    From an implementation perspective, this is generally simplest.

    a. all are either IPoIB-RC or IPoIB-UC.

    Preference is only 1 to be defined.


            -- There is also a UD QP associated. The UD QP will be either 3a or 3b
               based on WG concensus.

            -- All unicast transmission is on the IPoIB mode i.e. RC or UC.

    For a given endnode pair, the policy of which QP is used for a given unicast IP datagram is really a local issue.  I see some merit in the attempt to bifurcate this to multicast / broadcast to the UD QP and unicast to the *C QP.  However, if the datagram fits in the PMTU of the UD QP, then either could be used.  The driver would work either case.  Please keep in mind that multiple *C QP can be used and their usage needs to be a local issue and not defined within the spec.

    b. all are IPoIB-UD. Additionally they can be one of IPoIB-RC or IPoIB-UC
    or both.

            -- The presence of the flags indicate the type of communication possible.
            -- The decision of communicating using a specific mode is determined by
               the supported modes and the local policy. Note that incompatible
               policies imply that the fallback is communication over UD.
            -- fallback mode of communication is UD


    4b adds a lot of flexibility at the expense of a simple decision. 4a. by
    contrast is straightforward.


    5. MTU negotiation

            In the private data field of the CM message the desired MTU is
            included.

            It was suggested during the IPoIB meeting at IETF that it need not be
            symmetric. That is a good idea. Thus each peer declares the max MTU it
            prefers


            REQ: <my desired MTU>
            REP: <my desired MTU>
            RTU:

    Rephrase this as maximum logical MTU to avoid confusion with the IB link MTU.  If you start down this path, then you may need to also consider an exchange of what range of DiffServ code points to use as well.  Not clear that anyone needs to deal with any latency or bandwidth guarantees but the "camel's nose is starting to enter the tent" as the saying goes.


    6. Multiple connections for the same IP address

            Local decision. Note that the peer might choose to not honour multiple
            connections.

    Agreed.

    Mike




    Vivek





    On Wed, 17 Nov 2004, Michael Krause wrote:

    > At 11:38 PM 11/16/2004, Vivek Kashyap wrote:
    >
    >
    >
    > >Hi,  I have a couple of questions relative to IPoIB:  1.
    > >draft-ietf-ipoib-ip-over-infiniband-07.txt states: "Every IPoIB interface
    > >MUST "FullMember" join the IB multicast group defined by the
    > >broadcast-GID."  Isn't the broadcast group for IPv4 ? When the IPoIB
    > >interface is IPv6 only, does this group still need be joined ?  If not,
    > >where do the parameters for any IPv6 groups come from ? I am presuming
    > >that this group needs to be joined in  the IPv6 only case. I just want to
    > >be sure.
    > ><VK> Yes, the broadcast-GID is at the InfiniBand layer and MUST be joined
    > >whether you are running at v4 or v6 layer. <VK>  2. ALso, what is the
    > >latest status of the Vivek's connected mode draft ? Will it be moving
    > >forward ?  <VK> I'll be submitting it as
    > >draft-ietf-ipoib-connected-mode-00.txt by the end of the month. There were
    > >some interesting suggestions that were made during the IETF WG meeting.
    > >Two of the suggestions of consequence are given below. The others we can
    > >discuss when the minutes are published (they include some additional
    > >requests on clarification on the transmission draft too).  a. The current
    > >draft makes the various modes mutually exclusive i.e. RC, UC and UD are
    > >not allowed simultaneously in the same IP subnet. The thought is that it
    > >is a link characteristic and hence different per connection mode. It was
    > >suggested that one be allowed to mix up RC/UC. This goes back to the
    > >original suggestion in the first draft which was:  IPoIB-UD must always be
    > >supported. Additionally, the interface can also support either both of RC
    > >and UC, or one of them. Or neither of them.
    > >
    > >UD MUST always be supported.
    > >
    > ><VK> That is and has always been the requirement right from the first
    > >draft. <VK>
    > >
    > >I personally don't care whether one does RC or UC but I don't think both
    > >are required as a MAY option. The advantage of RC is the send credit
    > >algorithm. The advantage of UC is the lack of ACK packets. ACK is noise in
    > >the fabric while send credits provide a simple method to maintain
    > >bandwidth / injection control on a per flow basis.
    > >
    > >I see no problems with supporting both UD and *C on the same subnet; it is
    > >rather foolish to attempt to mandate these be on separate subnets.b
    > ><VK> As per the connected-mode draft the UD mechanism is *always*
    > >required; address resolutoin depends on it.
    > >
    > >The only point of discussion is whether all nodes must support the same
    > >link characteristics in the subnet i.e. all are RC (and UD), or all or UC
    > >(and UD), or all are UD only.
    >
    > Obviously I would oppose such a solution as it creates artificial
    > constraints with little benefit.
    >
    > >The alternative is to allow all the nodes to be mixed up with some nodes
    > >being RC/UD, others UC/UD and a third set UD only and yet others probably
    > >supporting all. within the same IP subnet. [Can the same serviceID be used
    > >by both RC and UC ?]
    > >
    > >The third alternative is to associating UD only or UD + one of RC or UC on
    > >the same interface. In such a case if mismatched/unsupported connected
    > >modes are supported by two nodes then the fall back to UD. This option is
    > >not too different from UD QP + RC or UC mechanism.
    >
    > KISS:
    >
    > - UD universal
    > - *C opportunistic
    >          - Local management issue to control what is sent on the *C
    > interface.  No need to specify
    >          - Advertise whether one or more ports are supported by UD or *C
    >          - Advertise whether one or more QP are supported by UD or *C
    >          - Let local management determine policy for what services are
    > mapped where - no need to specify
    >
    > This is both an interoperable approach and simple to implement.  There may
    > be some desire to add a policy interface to state preference for specific
    > types of traffic over a given QP.  I would not oppose this but would view
    > this as a separate draft once the basics are worked out.
    >
    >
    >
    > ><VK>
    > >b. Another suggestion was to allow multiple connected mode links (i.e. at
    > >IB UC/RC level) between peers.  One thought can be 'yes, but user beware':
    > >The IB connections are made using the service ID that is derived from the
    > >QPN as described in the draft. If a second attempt succeeds then there are
    > >two links. It is up to the implementation to either allow or disallow
    > >multiple links.
    > >
    > >Again, this has been suggested in the past (though most who were involved
    > >in the original discussions years gone by are likely gone since much of
    > >this discussion occurred before the IETF workgroup was established).
    > >
    > ><VK> I'm one of the vestiges of those early times along with you and a few
    > >others...so we have hope :). <VK>
    > >
    > >There is obvious benefit to supporting multiple RC per endnode pair. I do
    > >not see any technical reason to oppose nor any issue from an
    > >interoperability perspective. There is no reason for a "user beware".
    > >
    > ><VK> It is not opposed. The 'user beware' is only underscoring that the
    > >the peer interface might not support multiple links- it might enforce a
    > >limited number of connections (maybe only one) between a pair of GIDs.
    > >Similarly, an implementation not wanting to support multiple links MUST
    > >take steps to deny multiple requests.
    >
    > *C requires CM to operate thus it is a local issue whether additional CM
    > operations are accepted or not.  A given requester node may issue N and a
    > given responder may state 0-N as an implementation may limit the number of
    > *C available for IP traffic.
    >
    >
    > ><VK>
    > >
    > >The work is rather straight to do and implement and the benefit to
    > >customers, is again, rather obvious when one considers what the IB fabric
    > >offers and how connections can be enable flows through multipath as well
    > >as transparent fail-over, flow scheduling, mapping of DiffServ to
    > >different arbitration / paths, etc.
    > >
    > ><VK> In addition Large MTU and APM are two of the main reasons why I've
    > >been proposing IPoIB-connected mode for so long. In terms of IPoIB itself,
    > >except for the Large MTU, the parameters are hidden from it.<VK>
    >
    > Mike

    __

    Vivek Kashyap
    Linux Technology Center, IBM


    _______________________________________________
    IPoverIB mailing list
    IPoverIB <at> ietf.org
    https://www1.ietf.org/mailman/listinfo/ipoverib
    _______________________________________________
    IPoverIB mailing list
    IPoverIB <at> ietf.org
    https://www1.ietf.org/mailman/listinfo/ipoverib
    
    H.K. Jerry Chu | 18 Nov 2004 20:27
    Picon

    comments on draft-kashyap-ipoib-connected-mode-02.txt

    In the last IETF61 IPoIB meeting I made several comments on the
    connected mode draft. I'm sending them to the list for a general
    discussion. (Yes I saw some disucssion on the connected mode
    draft already. I'll try to catch up with the thread after this
    mail.)
    
    1. The draft makes a distinction between IPoIB-CM interfaces
    and IPoIB-UD interfaces, and portrays IPoIB-UC or IPoIB-RC as
    separate subnets superimposed on top of an IPoIB-UD subnet.
    
    For the above to work, due to a lack of multicast support, a fully
    connected network by itself can't meet the requirement of an IP
    link unless multicast is fully emulated through the use of
    multiple unicasts. The latter is complex and cumbersome.
    
    A much simpler model, which I think was presented in earlier
    drafts, is to fold the use of IB connections fully into a
    regular IPoIB-UD subnet, allowing any two IPoIB nodes to
    optionally negotiate the use of IB connection between themselves.
    
    This much simplified model is not without its drawback. Some
    nice IP link attributes are no longer unique within a link.
    E.g., the link MTU now becomes per-node-pair MTU. Moreover,
    the MTU size for multicast will be different from the MTU size
    for unicast if IB connections are used. IB UC/RC may exhibit
    different RAS, flow control, QoS or other link characteristics
    than UD. But I consider these problems a reasonable price to
    pay for a seamless support of UC/RC mode in an IPoIB link
    defined by UD.
    
    2. The negotiation of the per-connection MTU seems more
    complicated than necessary. I think all is needed is for a
    node to advertise its own "receive MTU". That is, the MTU
    size its peer should never go over when sending packets
    to the local interface. Yes this may break the traditional
    concept of "symmetric" MTUs. But we're already breaking the
    notion of per-link MTU, requring a lot of changes in the host
    stack anyway. This additonal breakage doesn't seem much.
    
    I haven't verified if this asymmetric MTU matches well with
    IBA connections though.
    
    3. Regarding allowing multiple IB connections between a node
    pair, since given an IP address there is only one link-address
    for it implying one QPN, hence one service-ID, if a single
    service-ID can be used to create multiple IB connections
    then this can happen transparently. Otherwise we've got a
    problem.
    
    Jerry
    

    Gmane