Vivek Kashyap | 20 Jul 2004 01:46
Picon
Favicon

IPoIB - connected mode


I've submitted a draft describing IP over connected mode of InfiniBand. It covers both the reliable as well as
unreliable connected modes. As has been discussed before, IPoIB over connected mode provides a large MTU and APM
allowing for better performance as well as link failover. Other aspects are included in the draft.

Vivek
--------------------------------

A New Internet-Draft is available from the on-line Internet-Drafts directories.


                Title                                  : IP over InfiniBand: Connected Mode
                Author(s)                 : V. Kashyap
                Filename                 : draft-kashyap-ipoib-connected-mode-02.txt
                Pages                                  : 10
                Date                                  : 2004-7-19
               
This document specifies a method for transmitting IPv4/IPv6
       packets and address resolution over the connectd modes of
       InfiniBand.

A URL for this Internet-Draft is:
http://www.ietf.org/internet-drafts/draft-kashyap-ipoib-connected-mode-02.txt
--
Vivek Kashyap
Linux Technology Center, IBM
vivk <at> us.ibm.com
kashyapv <at> us.ibm.com
Ph: 503 578 3422 T/L: 775 3422
_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
Hal Rosenstock | 24 Jul 2004 23:14
Picon
Favicon

Some Questions on the IPoIB - Connected Mode Draft

Hi,
 
I just finished reading the "IPoIB connected mode" I-D (http://www.ietf.org/internet-drafts/draft-kashyap-ipoib-connected-mode-02.txt) and have some questions:
 
1. In 2.2  Outline of Address Resolution, it is stated that:
Every IPoIB-CM interface MUST have two QPs associated with it:
                1) A connected mode QP
                2) An unreliable datagram mode QP
I'm a little confused by the description of the first QP. Is this a really a UD QP used for connected mode address resolution (when
the interface is not in the same scope and partition as the UD one) ? It is also later stated (in 3.0 Address Resolution) that IPoIB-CM
can use the same UD QP as the one used by IPoIB-UD interface when they share the same partition and scope so it seems this
contradicts the earlier statement. It seems like the UD QP for connected mode is only required when the same partition and scope
are not shared with the IPoIB-UD interface.2. In 3.1 Link Layer Address, it is stated that the RC and UC flags are mutually exclusive. Is this a requirement or more a
configuration issue ? Is there something which would break with both flags on ? The only issue I see is if a IPoIB-CM subnet
includes some nodes supporting only RC and others only UC. That couldn't be mixed. I suppose setting both opens the door
for that to occur.3. In 3.2 IB Connection Setup, it is stated that the node SHOULD NOT attempt another connection to the remote peer
using the same service ID as for an already existing connection. Wouldn't that potentially be useful for support of QoS ?
Is this a recommendation (SHOULD NOT) just to reduce the number of connections (and QPs) ? 4. In 3.3 Service ID, wouldn't it be better if the QPN didn't overlap a 32 bit boundary ?5. In 5.1 Per Connection MTU, why does the peer need to REJ the connection the the REQ desired MTU is not acceptable
(and is more than the minimum MTU) ? Couldn't we REP with the "MTU granted" for that case ? The active side which
issued the REQ could always REJ the REP if it didn't like the "MTU granted".-- Hal
_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
Dror Goldenberg | 25 Jul 2004 23:33
Picon

RE: IPoIB - connected mode - Question about connection establishment

From section 3.1:
 
                This is a single octet field. If bit 0 is set then it
                implies that in the sender's view,the subnet is built
                over IB's 'reliable connected' i.e. RC mode. If bit 1 is
                set then it implies that the subnet is built over IB's
                "unreliable connected" i.e. UC mode. All other bits in
                the octet are reserved and MUST be set to 0.
                ...
                Both the RC and UC flags MUST not be set at the same
                time.  They are mutually exclusive.
                ...
                Note:
                    The above implies that a given IP subnet can only be
                    supported on one of the InfiniBand modes at any
                    time. If the link layer includes no flags then it is
                    part of an IPoIB-UD subnet, if the link layer
                    includes the RC flag then it is part of an IPoIB-RC
                    subnet, if the link layer includes the UC flag then
                    it is part of an IPoIB-UC subnet.
 

What I had in mind was 
a slightly different model.
 
Each node participating in an IPoIB subnet can declare that it supports
connected modes beyond UD. For the supported mode, it'll set either
or both the UC or RC bits in the ARP reply. If a node supports a
connected mode, then any other node in the subnet is free to create
a connection with that node through the CM. Obviously, a node that
initiates a connection to such a node must support the same connection
mode.
 
 
-Dror
 
 
_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
Dror Goldenberg | 25 Jul 2004 23:33
Picon

RE: Some Questions on the IPoIB - Connected Mode Draft

Hi Hal,
 
I can try to address items  4 and 5:
 
4. QPN crossing 32 bit boundary - I don't think that this is the case. If you count
    bytes from the MSB down to the LSB, then:
        byte 0    = IETF = 0x01
        byte 1    = Type = 0x00
        byte 2-3 = Rsvd = 0x0000
      ----- 32 bit boundary ---
        byte 4-6 = QPN[23:0]
        byte 7    = Rsvd = 0x00
      ----- 32 bit boundary ---
 
5. Rejection of REQ in case of non matching MTU - I think that it may have to do with
    the fact that by the time you send the REQ, in some implementations you
    already allocated your buffers and posted them on the WQ (while the QP
    was in INIT state). So, if you don't reject and thus force the QP to flush all these
    WQEs, you may end up doing some complicated buffer management operations.
    E.g. getting rid of the old posted buffers, allocating new ones and reposting them
    once the old buffers complete.
    Note that I am really not sure about this one...
 
 
-Dror
 
 
   
-----Original Message-----
From: Hal Rosenstock [mailto:hnrose <at> earthlink.net]
Sent: Sunday, July 25, 2004 12:14 AM
To: ipoverib <at> ietf.org
Subject: [Ipoverib] Some Questions on the IPoIB - Connected Mode Draft

Hi,
 
I just finished reading the "IPoIB connected mode" I-D (http://www.ietf.org/internet-drafts/draft-kashyap-ipoib-connected-mode-02.txt) and have some questions:
 
1. In 2.2  Outline of Address Resolution, it is stated that:
Every IPoIB-CM interface MUST have two QPs associated with it:
                1) A connected mode QP
                2) An unreliable datagram mode QP
I'm a little confused by the description of the first QP. Is this a really a UD QP used for connected mode address resolution (when
the interface is not in the same scope and partition as the UD one) ? It is also later stated (in 3.0 Address Resolution) that IPoIB-CM
can use the same UD QP as the one used by IPoIB-UD interface when they share the same partition and scope so it seems this
contradicts the earlier statement. It seems like the UD QP for connected mode is only required when the same partition and scope
are not shared with the IPoIB-UD interface.2. In 3.1 Link Layer Address, it is stated that the RC and UC flags are mutually exclusive. Is this a requirement or more a
configuration issue ? Is there something which would break with both flags on ? The only issue I see is if a IPoIB-CM subnet
includes some nodes supporting only RC and others only UC. That couldn't be mi! xed. I suppose setting both opens the door
for that to occur.3. In 3.2 IB Connection Setup, it is stated that the node SHOULD NOT attempt another connection to the remote peer
using the same service ID as for an already existing connection. Wouldn't that potentially be useful for support of QoS ?
Is this a recommendation (SHOULD NOT) just to reduce the number of connections (and QPs) ? 4. In 3.3 Service ID, wouldn't it be better if the QPN didn't overlap a 32 bit boundary ?5. In 5.1 Per Connection MTU, why does the peer need to REJ the connection the the REQ desired MTU is not acceptable
(and is more than the minimum MTU) ? Couldn't we REP with the "MTU granted" for that case ? The active side which
issued the REQ could always REJ the REP if it di dn't like the "MTU granted".-- Hal!
_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
Hal Rosenstock | 26 Jul 2004 12:59
Picon
Favicon

Re: Some Questions on the IPoIB - Connected Mode Draft

Hi Dror,
 
On #4, it looks to me like the service ID currently is :
The Service-IDs used by IPoIB will be in the format: +--------+--------+--------+--------+-------+-------+--------+-------+ |00000001| Type |Reserved| QPN | Reserved | +--------+--------+--------+--------+-------+-------+--------+-------+
        byte 0    = IETF = 0x01
        byte 1    = Type = 0x00
        byte 2    = Rsvd = 0x00
        byte 3-5  = QPN[23:0]
        byte 6-7   = Rsvd = 0x0000
 
as the first Rsvd field is a single byte and the second one two bytes.
 
It would be better the way you state:
+--------+--------+--------+-------+-------+--------+-------+--------+ |00000001| Type | Reserved | QPN |Reserved| +--------+--------+--------+-------+-------+--------+-------+--------+
or
+--------+--------+--------+-------+-------+--------+-------+--------+ |00000001| Type | Reserved | QPN | +--------+--------+--------+-------+-------+--------+-------+--------+
 
-- Hal
----- Original Message -----
Sent: Sunday, July 25, 2004 5:33 PM
Subject: RE: [Ipoverib] Some Questions on the IPoIB - Connected Mode Draft

Hi Hal,
 
I can try to address items  4 and 5:
 
4. QPN crossing 32 bit boundary - I don't think that this is the case. If you count
    bytes from the MSB down to the LSB, then:
        byte 0    = IETF = 0x01
        byte 1    = Type = 0x00
        byte 2-3 = Rsvd = 0x0000
      ----- 32 bit boundary ---
        byte 4-6 = QPN[23:0]
        byte 7    = Rsvd = 0x00
      ----- 32 bit boundary ---
 
5. Rejection of REQ in case of non matching MTU - I think that it may have to do with
    the fact that by the time you send the REQ, in some implementations you
    already allocated your buffers and posted them on the WQ (while the QP
    was in INIT state). So, if you don't reject and thus force the QP to flush all these
    WQEs, you may end up doing some complicated buffer management operations.
    E.g. getting rid of the old posted buffers, allocating new ones and reposting them
    once the old buffers complete.
    Note that I am really not sure about this one...
 
 
-Dror
 
 
   
-----Original Message-----
From: Hal Rosenstock [mailto:hnrose <at> earthlink.net]
Sent: Sunday, July 25, 2004 12:14 AM
To: ipoverib <at> ietf.org
Subject: [Ipoverib] Some Questions on the IPoIB - Connected Mode Draft

Hi,
 
I just finished reading the "IPoIB connected mode" I-D (http://www.ietf.org/internet-drafts/draft-kashyap-ipoib-connected-mode-02.txt) and have some questions:
 
1. In 2.2  Outline of Address Resolution, it is stated that:
Every IPoIB-CM interface MUST have two QPs associated with it:
                1) A connected mode QP
                2) An unreliable datagram mode QP
I'm a little confused by the description of the first QP. Is this a really a UD QP used for connected mode address resolution (when
the interface is not in the same scope and partition as the UD one) ? It is also later stated (in 3.0 Address Resolution) that IPoIB-CM
can use the same UD QP as the one used by IPoIB-UD interface when they share the same partition and scope so it seems this
contradicts the earlier statement. It seems like the UD QP for connected mode is only required when the same partition and scope
are not shared with the IPoIB-UD interface.2. In 3.1 Link Layer Address, it is stated that the RC and UC flags are mutually exclusive. Is this a requirement or more a
configuration issue ? Is there something which would break with both flags on ? The only issue I see is if a IPoIB-CM subnet
includes some nodes supporting only RC and others only UC. That couldn't be mi! xed. I suppose setting both opens the door
for that to occur.3. In 3.2 IB Connection Setup, it is stated that the node SHOULD NOT attempt another connection to the remote peer
using the same service ID as for an already existing connection. Wouldn't that potentially be useful for support of QoS ?
Is this a recommendation (SHOULD NOT) just to reduce the number of connections (and QPs) ? 4. In 3.3 Service ID, wouldn't it be better if the QPN didn't overlap a 32 bit boundary ?5. In 5.1 Per Connection MTU, why does the peer need to REJ the connection the the REQ desired MTU is not acceptable
(and is more than the minimum MTU) ? Couldn't we REP with the "MTU granted" for that case ? The active side which
issued the REQ could always REJ the REP if it didn't like the "MTU granted".-- Hal!

_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
Dror Goldenberg | 26 Jul 2004 13:06
Picon

RE: Some Questions on the IPoIB - Connected Mode Draft

Hal,
 
Ooops, you're right. I was reading it on my email with font that didn't have fixed width....
So, I agree with what you say, it'll be nice to have QPN not crossing a 32 bit boundary.
 
-Dror
-----Original Message-----
From: Hal Rosenstock [mailto:hnrose <at> earthlink.net]
Sent: Monday, July 26, 2004 1:59 PM
To: Dror Goldenberg
Cc: ipoverib <at> ietf.org
Subject: Re: [Ipoverib] Some Questions on the IPoIB - Connected Mode Draft

Hi Dror,
 
On #4, it looks to me like the service ID currently is :
The Service-IDs used by IPoIB will be in the format: +--------+--------+--------+--------+-------+-------+--------+-------+ |00000001| Type |Reserved| QPN | Reserved | +--------+--------+--------+--------+-------+-------+--------+-------+
        byte 0    = IETF = 0x01
        byte 1    = Type = 0x00
        byte 2    = Rsvd = 0x00
        byte 3-5  = QPN[23:0]
        byte 6-7   = Rsvd = 0x0000
 
as the first Rsvd field is a single byte and the second one two bytes.
 
It would be better the way you state:
+--------+--------+--------+-------+-------+--------+-------+--------+ |00000001| Type | Reserved | QPN |Reserved| +--------+--------+--------+-------+-------+--------+-------+--------+
or
+--------+--------+--------+-------+-------+--------+-------+--------+ |00000001| Type | Reserved | QPN | +--------+--------+--------+-------+-------+--------+-------+--------+
 
-- Hal
----- Original Message -----
Sent: Sunday, July 25, 2004 5:33 PM
Subject: RE: [Ipoverib] Some Questions on the IPoIB - Connected Mode Draft

Hi Hal,
 
I can try to address items  4 and 5:
 
4. QPN crossing 32 bit boundary - I don't think that this is the case. If you count
    bytes from the MSB down to the LSB, then:
        byte 0    = IETF = 0x01
        byte 1    = Type = 0x00
        byte 2-3 = Rsvd = 0x0000
      ----- 32 bit boundary ---
        byte 4-6 = QPN[23:0]
        byte 7    = Rsvd = 0x00
      ----- 32 bit boundary ---
 
5. Rejection of REQ in case of non matching MTU - I think that it may have to do with
    the fact that by the time you send the REQ, in some implementations you
    already allocated your buffers and posted them on the WQ (while the QP
    was in INIT state). So, if you don't reject and thus force the QP to flush all these
    WQEs, you may end up doing some complicated buffer management operations.
    E.g. getting rid of the old posted buffers, allocating new ones and reposting them
    once the old buffers complete.
    Note that I am really not sure about this one...
 
 
-Dror
 
 
   
-----Original Message-----
From: Hal Rosenstock [mailto:hnrose <at> earthlink.net]
Sent: Sunday, July 25, 2004 12:14 AM
To: ipoverib <at> ietf.org
Subject: [Ipoverib] Some Questions on the IPoIB - Connected Mode Draft

Hi,
 
I just finished reading the "IPoIB connected mode" I-D (http://www.ietf.org/internet-drafts/draft-kashyap-ipoib-connected-mode-02.txt) and have some questions:
 
1. In 2.2  Outline of Address Resolution, it is stated that:
Every IPoIB-CM interface MUST have two QPs associated with it:
                1) A connected mode QP
                2) An unreliable datagram mode QP
I'm a little confused by the description of the first QP. Is this a really a UD QP used for connected mode address resolution (when
the interface is not in the same scope and partition as the UD one) ? It is also later stated (in 3.0 Address Resolution) that IPoIB-CM
can use the same UD QP as the one used by IPoIB-UD interface when they share the same partition and scope so it seems this
contradicts the earlier statement. It seems like the UD QP for connected mode is only required when the same partition and scope
are not shared with the IPoIB-UD interface.2. In 3.1 Link Layer Address, it is stated that the RC and UC flags are mutually exclusive. Is this a requirement or more a
configuration issue ? Is there something which would break with both flags on ? The only issue I see is if a IPoI B-CM subnet
includes some nodes supporting only RC and others only UC. That couldn't b! e mi! xed. I suppose setting both opens the door
for that to occur.3. In 3.2 IB Connection Setup, it is stated that the node SHOULD NOT attempt another connection to the remote peer
using the same service ID as for an already existing connection. Wouldn't that potentially be useful for support of QoS ?
Is this a recommendation (SHOULD NOT) just to reduce the number of connections (and QPs) ? 4. In 3.3 Service ID, wouldn't it be better if the QPN didn't overlap a 32 bit boundary ?5. In 5.1 Per Connection MTU, why does the peer need to REJ the connection the the REQ desired MTU is not acceptable
(and is more than the minimum MTU) ? Couldn't we REP with the "MTU granted" for that case ? The active side which
issued the REQ could always REJ the REP if it di dn't like the "MTU granted".-- Hal! !

_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
bill | 27 Jul 2004 00:05

IETF 60 - IPoIB WG Agenda

1300-1500 Afternoon Sessions I
IP over Infiniband WG
10 min - Welcome, selection of minute taker                             
      Bill
10 min - Agenda Bashing

20 min Status of IP Encapsulation Draft                                 
           Vivek
20 min Status of MIB drafts                                             
                     Bill

New business
30 Min IP over Infiniband Connected Mode Draft                          
Vivek
30 Min SM MIB                                                           
                           Cheng

Conclude the meeting

I will post full links to the drafts tomorrow - Just realized that this
probably bounced last week, and I don't have a copy here at work

Bill
bill | 28 Jul 2004 19:00

(no subject)

1300-1500 Afternoon Sessions I
IP over Infiniband WG
10 min - Welcome, selection of minute taker                        Bill
10 min - Agenda Bashing

10 min Status of MIB Drafts                                             
        Bill
20 min  IP over IB Implementation Report OpenIB              Hal
Rosenstock

                                (Bill Presenting)
10 min Status of IP Encapsulation Draft                                 
Vivek

New business
30 Min IP over Infiniband Connected Mode Draft                 Vivek
30 Min SM MIB                                                           
                Cheng

Conclude the meeting

I am adding an implementation report from the OpenIB work.  I would like
to thank Hal, and the OpenIB participants for showing us how there
implementation is proceeding.

Bill
Vivek Kashyap | 29 Jul 2004 00:33
Picon
Favicon

Re: Some Questions on the IPoIB - Connected Mode Draft

Hal,

Thanks for your comments. My reply within <VK> below.

Vivek
--
Vivek Kashyap
Linux Technology Center, IBM
vivk <at> us.ibm.com
kashyapv <at> us.ibm.com
Ph: 503 578 3422 T/L: 775 3422


Hi,

I just finished reading the "IPoIB connected mode" I-D (http://www.ietf.org/internet-drafts/draft-kashyap-ipoib-connected-mode-02.txt) and have some questions:

1. In 2.2 Outline of Address Resolution, it is stated that:
Every IPoIB-CM interface MUST have two QPs associated with it:
1) A connected mode QP
2) An unreliable datagram mode QP
I'm a little confused by the description of the first QP. Is this a really a UD QP used for connected mode address resolution (when
the interface is not in the same scope and partition as the UD one) ? It is also later stated (in 3.0 Address Resolution) that IPoIB-CM
can use the same UD QP as the one used by IPoIB-UD interface when they share the same partition and scope so it seems this
contradicts the earlier statement. It seems like the UD QP for connected mode is only required when the same partition and scope
are not shared with the IPoIB-UD interface.

<VK>
Connected mode does not support multicasting. Address resolution is best implemented as a broadcast/multicast service for IP. Therefore, the idea is to use an unreliable datagram QP for address resolution - the 2nd QP mentioned. The first QP is the connected mode (RC or UC) over which the data transfer after address resolution will occur. That is, there are two types of QPs required when implementing IPoIB-CM: 1) a Connected mode QP (2) and unreliable datagram QP.

It is further suggested that one MAY choose to use the same UD QP as used for IPoIB-UD for address resolution but it is not required that one do so. One can certainly use a different UD QP. This follows from the broadcast GID being the same in all cases if the IP version, scope and P_Key are the same.

<VK>

2. In 3.1 Link Layer Address, it is stated that the RC and UC flags are mutually exclusive. Is this a requirement or more a
configuration issue ? Is there something which would break with both flags on ? The only issue I see is if a IPoIB-CM subnet
includes some nodes supporting only RC and others only UC. That couldn't be mixed. I suppose setting both opens the door
for that to occur.

<VK> yes, exactly. Therefore the drafts keeps the two separate. <VK>

3. In 3.2 IB Connection Setup, it is stated that the node SHOULD NOT attempt another connection to the remote peer
using the same service ID as for an already existing connection. Wouldn't that potentially be useful for support of QoS ?
Is this a recommendation (SHOULD NOT) just to reduce the number of connections (and QPs) ?

<VK> The IB level connection link is between two machines. The peer might return the same QP, reject the request or grant another as per IB. The caller needs to keep track of this in its internal tables. Additionally, the respondent should do the same if it accepts multiple IB connections. These aspects are hidden from the IP layer. Hence, though not disallowed since it can be useful, multiple IB connections must be handled carefully by the user.<VK>


4. In 3.3 Service ID, wouldn't it be better if the QPN didn't overlap a 32 bit boundary ?

<VK> The QPN is 24 bits though we could move it by a byte. <VK>


5. In 5.1 Per Connection MTU, why does the peer need to REJ the connection the the REQ desired MTU is not acceptable
(and is more than the minimum MTU) ? Couldn't we REP with the "MTU granted" for that case ? The active side which
issued the REQ could always REJ the REP if it didn't like the "MTU granted".

<VK> I'd suggest the following to make it a very simple transacation.

The private data field includes only one value the 'Desired MTU'. The 'minimum MTU' is the 'link MTU' i.e. the value received on joining the broadcast-GID (will be known to all nodes). If the receiver cannot accept the Desired MTU then it responds with the 'link MTU'. If it can then it responds with the 'Desired MTU'.

An alternative, based on your comment above is to, let the response be any value between minimum to 'desired'. Then the original respondent has the option of either accepting that value or responding with the 'Minimum'.

REJ is a bad option since the nodes belong to the same 'link' and hence should be able to talk to one another. REJ can be used for error cases only.

<VK>


-- Hal_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib

_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib
Hal Rosenstock | 29 Jul 2004 03:06
Picon
Favicon

Re: Some Questions on the IPoIB - Connected Mode Draft

Hi Vivek,
 
Thanks for your reply. I have a few <hnr>...</hnr> follow up comments below.
 
-- Hal
----- Original Message -----
Sent: Wednesday, July 28, 2004 6:33 PM
Subject: Re: [Ipoverib] Some Questions on the IPoIB - Connected Mode Draft

Hal,

Thanks for your comments. My reply within <VK> below.

Vivek
--
Vivek Kashyap
Linux Technology Center, IBM
vivk <at> us.ibm.com
kashyapv <at> us.ibm.com
Ph: 503 578 3422 T/L: 775 3422


Hi,

I just finished reading the "IPoIB connected mode" I-D (http://www.ietf.org/internet-drafts/draft-kashyap-ipoib-connected-mode-02.txt) and have some questions:

1. In 2.2 Outline of Address Resolution, it is stated that:
Every IPoIB-CM interface MUST have two QPs associated with it:
1) A connected mode QP
2) An unreliable datagram mode QP
I'm a little confused by the description of the first QP. Is this a really a UD QP used for connected mode address resolution (when
the interface is not in the same scope and partition as the UD one) ? It is also later stated (in 3.0 Address Resolution) that IPoIB-CM
can use the same UD QP as the one used by IPoIB-UD interface when they share the same partition and scope so it seems this
contradicts the earlier statement. It seems like the UD QP for connected mode is only required when the same partition and scope
are not shared with the IPoIB-UD interface.

<VK>
Connected mode does not support multicasting. Address resolution is best implemented as a broadcast/multicast service for IP. Therefore, the idea is to use an unreliable datagram QP for address resolution - the 2nd QP mentioned. The first QP is the connected mode (RC or UC) over which the data transfer after address resolution will occur. That is, there are two types of QPs required when implementing IPoIB-CM: 1) a Connected mode QP (2) and unreliable datagram QP.

It is further suggested that one MAY choose to use the same UD QP as used for IPoIB-UD for address resolution but it is not required that one do so. One can certainly use a different UD QP. This follows from the broadcast GID being the same in all cases if the IP version, scope and P_Key are the same.

<VK>

<hnr> It might be clearer as:

Every IPoIB-CM interface MUST have two or more QPs associated with it:
1) One or more connected mode QPs
2) An unreliable datagram mode QP for address resolution

</hnr>


2. In 3.1 Link Layer Address, it is stated that the RC and UC flags are mutually exclusive. Is this a requirement or more a
configuration issue ? Is there something which would break with both flags on ? The only issue I see is if a IPoIB-CM subnet
includes some nodes supporting only RC and others only UC. That couldn't be mixed. I suppose setting both opens the door
for that to occur.

<VK> yes, exactly. Therefore the drafts keeps the two separate. <VK>

3. In 3.2 IB Connection Setup, it is stated that the node SHOULD NOT attempt another connection to the remote peer
using the same service ID as for an already existing connection. Wouldn't that potentially be useful for support of QoS ?
Is this a recommendation (SHOULD NOT) just to reduce the number of connections (and QPs) ?

<VK> The IB level connection link is between two machines. The peer might return the same QP, reject the request or grant another as per IB. The caller needs to keep track of this in its internal tables. Additionally, the respondent should do the same if it accepts multiple IB connections. These aspects are hidden from the IP layer. Hence, though not disallowed since it can be useful, multiple IB connections must be handled carefully by the user.<VK>

<hnr> I don't understand how the peer could return the same QP as RC and UC QPs are 1:1. I think it either accepts the connection and uses another QP or rejects it. </hnr>


4. In 3.3 Service ID, wouldn't it be better if the QPN didn't overlap a 32 bit boundary ?

<VK> The QPN is 24 bits though we could move it by a byte. <VK>


5. In 5.1 Per Connection MTU, why does the peer need to REJ the connection the the REQ desired MTU is not acceptable
(and is more than the minimum MTU) ? Couldn't we REP with the "MTU granted" for that case ? The active side which
issued the REQ could always REJ the REP if it didn't like the "MTU granted".

<VK> I'd suggest the following to make it a very simple transacation.

The private data field includes only one value the 'Desired MTU'. The 'minimum MTU' is the 'link MTU' i.e. the value received on joining the broadcast-GID (will be known to all nodes). If the receiver cannot accept the Desired MTU then it responds with the 'link MTU'. If it can then it responds with the 'Desired MTU'.

An alternative, based on your comment above is to, let the response be any value between minimum to 'desired'. Then the original respondent has the option of either accepting that value or responding with the 'Minimum'.

REJ is a bad option since the nodes belong to the same 'link' and hence should be able to talk to one another. REJ can be used for error cases only.

<VK>

<hnr> Your idea is simpler and better. The problem with my idea is that the original requester will need to send private data in the RTU indicating the MTU he accepts and the RTU cannot be relied upon. It may be lost so it is a bad idea to rely on it. </hnr>

-- Hal_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib

_______________________________________________
IPoverIB mailing list
IPoverIB <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib

Gmane