Connection Startup
Caitlin Bestler <cait <at> asomi.com>
2003-09-30 14:36:25 GMT
On Monday, September 29, 2003, at 02:12 PM, Black_David <at> emc.com wrote:
> Now that I've done my job in getting the problem statement
> and architecture documents to the ADs for publication, it's
> high time for the WG to get back to work from summer
> vacation ...
>
> ... I've seen very little list traffic on technical issues
> over the past month or so, and if I were to judge solely
> by that, I'd have a hard time justifying meeting time
> in Minneapolis. Now, I know better - we have serious open
> technical issues, and hence I'd like to see some comments/
> proposals/discussion on at least the following:
>
> (1) Connection startup. After all the discussion about
> minimization of round trips, I was shocked to see
> a 3 round-trip startup sequence presented in Vienna.
> The MPA draft authors are responsible for getting
> this down to something more reasonable, and comments
> on the list about how it should be done are welcome.
With the introduction of the start frame, there is no
need for extra delays during connection setup. What is
proposed works fine, the only adjustment required is
to remove some of the ordering rules. They are not
necessary and can increase the number of round trips.
A quick review:
Starting MPA mode has always been easy when both end
naturally know the point in the TCP stream where MPA
should begin.
This is easily accomplished when that start point is
zero (i.e., the stream is placed in MPA mode immediately)
or at the conclusion of an existing stream-mode negotiation
(as is proposed for iSER).
The problem case that was raised earlier was when the
need for negotiation is RDMA specific. Specifically,
some private data must be exchanged between the ULP
peers in order to ensure that the RDMA Endpoints are
selected/configured compatibly.
Earlier I had proposed including Private Data within
the MPA Start Frame to accomplish this. The MPA draft
authors counter-suggested an optional, but standard,
negotiation frame that could proceed the MPA Start
frame. Implicitly, the motivation for separating the
negotiation frame from the start frame was to allow
for the negotiation frame to be handled by drivers
and/or middleware rather than by the hardware/firmware.
While I still think the single Start Frame with
variable length Private Data is a better solution,
the optional negotiation frame is an acceptable
solution once unnecessary ordering rules are removed.
The key problem of *not* having a standard method
to exchange Private Data is that without one you cannot
exchnage variable length Private Data without doing a
very tricky round-trip up to the ULP immediately before
entering MPA mode.
Consider the case where there is a need to send a variable
length message immediately before the MPA Start Frame.
Without a standard format, the ULP must consume and
parse those bytes. And *then* place the stream in MPA
receive mode.
That either calls for a byte-by-byte fetch from the stream
and/or a putback capability. Either is contrary to the goal
of optimizing performance on non-MPA TCP streams.
A standardized negotiation frame allows the ULP peers to
place their streams in MPA mode (and bind them with RDMA
endpoints) before the variable length private data message.
This works well under all of the following scenarios:
Instant Startup: ULP peers have pre-designated the connection
to be in MPA mode. They each send an MPA Start Frame immediately
to conform this.
Post-streaming-mode Startup: After conducting streaming mode
negotiations over the TCP connections, the ULP peers have reached
the same state as an Instant Startup. They can each send MPA
Start frames immediately. The end that completed the negotiation
can even place its MPA Start Frame in the same TCP segment as
its final streaming mode response.
IB CM-compatible startup:
Active side sends a Negotiation Frame with its private data.
It also pre-designates the RDMA endpoint to be used.
Passive side consumes the Negotiation Frame and passes it to
the ULP as an event.
Passive side uses the private data to select/configure its
endpoint. It responds by sending its responding negotiation
frame, associating its endpoint and sending its MPA Start
Frame. The MPA Start Frame SHOULD be in the same TCP segment
as the Negotiation Frame when possible.
Active Side consumes Negotiation Frame, and immediately places
the TCP connection in MPA mode (sending its MPA Start Frame
and consuming the one sent to it). The previously designated
RDMA endpoint is associated, and will handle all DDP Segments
that are after the MPA Start Frame. The peer MPA Start frame
can be consumer immediately. The private data from the response
negotiation frame is reported to the ULP.
The simplified rules therefore are:
- An implementation MUST NOT send the MPA Start Frame until
it is ready to process FULPDUs starting at the agreed
point in the TCP stream.
- An implementation MUST NOT send FULPDUs prior to receiving
an MPA Start Frame from its peer.
These two rules ensure that there is an RDMA Endpoint
associated with the stream before any DDP Segments are sent.
- An implementation MUST NOT send FULPDUs earlier in its
output TCP Stream than the MPA Start Frame.
- An implementation MAY send zero or more Negotiation Frames
prior to the MPA Start Frame.
- If an implementation does not receive the exepected
MPA Start Frame at the agreed point in the TCP Stream
it MUST terminate the connection.