Internet-Drafts | 9 Sep 2003 19:18
Picon
Favicon

I-D ACTION:draft-ietf-rddp-arch-03.txt

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the Remote Direct Data Placement Working Group of the IETF.

	Title		: The Architecture of Direct Data Placement (DDP)And 
                          Remote Direct Memory Access (RDMA)On Internet 
                          Protocols
	Author(s)	: S. Bailey, T. Talpey
	Filename	: draft-ietf-rddp-arch-03.txt
	Pages		: 18
	Date		: 2003-9-9
	
This document defines an abstract architecture for Direct Data
Placement (DDP) and Remote Direct Memory Access (RDMA) protocols to
run on Internet Protocol-suite transports.  This architecture does
not necessarily reflect the proper way to implement such protocols,
but is, rather, a descriptive tool for defining and understanding
the protocols.  DDP allows the efficient placement of data into
buffers designated by Upper Layer Protocols (e.g. RDMA).  RDMA
provides the semantics to enable Remote Direct Memory Access
between peers in a way consistent with application requirements.

A URL for this Internet-Draft is:
http://www.ietf.org/internet-drafts/draft-ietf-rddp-arch-03.txt

To remove yourself from the IETF Announcement list, send a message to 
ietf-announce-request with the word unsubscribe in the body of the message.

Internet-Drafts are also available by anonymous FTP. Login with the username
"anonymous" and a password of your e-mail address. After logging in,
type "cd internet-drafts" and then
(Continue reading)

Talpey, Thomas | 10 Sep 2003 18:42
Picon

Re: I-D ACTION:draft-ietf-rddp-arch-03.txt

FYI, the changes in this most recent revision are very localized and reflect the
discussion from the document's Last Call around "shared receive queues". This
version restores the ddp_post_recv() and rdma_post_recv() which were introduced
in draft -01 (and removed in draft -02), along with some additional text clarifying
the behavior of ordering and resources on shared queues.

The relevant changes are in sections 2.1.2 and 2.1.3 around ddp_post_recv(),
and section 2.2.1 for rdma_post_recv().

Please post any comments on these additions to the list.

Tom.

At 01:18 PM 9/9/2003, Internet-Drafts <at> ietf.org wrote:
>A New Internet-Draft is available from the on-line Internet-Drafts directories.
>This draft is a work item of the Remote Direct Data Placement Working Group
>of the IETF.
>
>       Title           : The Architecture of Direct Data Placement (DDP)And
>                          Remote Direct Memory Access (RDMA)On Internet
>                          Protocols
>       Author(s)       : S. Bailey, T. Talpey
>       Filename        : draft-ietf-rddp-arch-03.txt
>       Pages           : 18
>       Date            : 2003-9-9
>      
>This document defines an abstract architecture for Direct Data
>Placement (DDP) and Remote Direct Memory Access (RDMA) protocols to
>run on Internet Protocol-suite transports.  This architecture does
>not necessarily reflect the proper way to implement such protocols,
>but is, rather, a descriptive tool for defining and understanding
>the protocols.  DDP allows the efficient placement of data into
>buffers designated by Upper Layer Protocols (e.g. RDMA).  RDMA
>provides the semantics to enable Remote Direct Memory Access
>between peers in a way consistent with application requirements.
>
>A URL for this Internet-Draft is:
>http://www.ietf.org/internet-drafts/draft-ietf-rddp-arch-03.txt
>
>To remove yourself from the IETF Announcement list, send a message to
>ietf-announce-request with the word unsubscribe in the body of the message.
>
>Internet-Drafts are also available by anonymous FTP. Login with the username
>"anonymous" and a password of your e-mail address. After logging in,
>type "cd internet-drafts" and then
>       "get draft-ietf-rddp-arch-03.txt".
>
>A list of Internet-Drafts directories can be found in
>http://www.ietf.org/shadow.html
>or ftp://ftp.ietf.org/ietf/1shadow-sites.txt
>
>
>Internet-Drafts can also be obtained by e-mail.
>
>Send a message to:
>       mailserv <at> ietf.org.
>In the body type:
>       "FILE /internet-drafts/draft-ietf-rddp-arch-03.txt".
>      
>NOTE:  The mail server at ietf.org can return the document in
>       MIME-encoded form by using the "mpack" utility.  To use this
>       feature, insert the command "ENCODING mime" before the "FILE"
>       command.  To decode the response(s), you will need "munpack" or
>       a MIME-compliant mail reader.  Different MIME-compliant mail readers
>       exhibit different behavior, especially when dealing with
>       "multipart" MIME messages (i.e. documents which have been split
>       up into multiple messages), so check your local documentation on
>       how to manipulate these messages.
>              
>              
>Below is the data which will enable a MIME compliant mail reader
>implementation to automatically retrieve the ASCII version of the
>Internet-Draft.
>
>

Black, David | 13 Sep 2003 01:19

FW: RDDP WG publication request

FYI, --David

-----Original Message-----
From: Black, David 
Sent: Friday, September 12, 2003 7:19 PM
To: 'jon.peterson <at> neustar.biz'; 'mankin <at> psg.com'
Cc: 'iesg-secretary <at> ietf.org'; Black, David
Subject: RDDP WG publication request

Jon and Allison,

The RDDP WG requests that the IESG approve publication
of the following two drafts as Informational RFCs:

	RDMA over IP Problem Statement
		draft-ietf-rddp-problem-statement-02.txt

	The Architecture of Direct Data Placement (DDP) and
	Remote Direct Memory Access (RDMA) on Internet Protocols
		draft-ietf-rddp-arch-03.txt

These drafts are products of the RDDP WG.

Thanks,
--David

----------------------------------------------------
David L. Black, Senior Technologist
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953             FAX: +1 (508) 293-7786
black_david <at> emc.com        Mobile: +1 (978) 394-7754
----------------------------------------------------
Internet-Drafts | 15 Sep 2003 16:11
Picon
Favicon

I-D ACTION:draft-ietf-rddp-sctp-00.txt

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the Remote Direct Data Placement Working Group of the IETF.

	Title		: Stream Control Transmission Protocol (SCTP) Remote 
                          Direct Memory Access (RDMA) Direct Data Placement 
                          (DDP) Adaptation
	Author(s)	: R. Stewart et al.
	Filename	: draft-ietf-rddp-sctp-00.txt
	Pages		: 23
	Date		: 2003-9-15
	
This document describes a method to adapt Direct Data Placement (DDP)
and Remote Direct Memory Access (RDMA) to Stream Control Transmission
Protocol (SCTP) RFC2960 [2] using a generic description found in
[RDMA-Draft] [4] and [DDP-Draft] [3].

A URL for this Internet-Draft is:
http://www.ietf.org/internet-drafts/draft-ietf-rddp-sctp-00.txt

To remove yourself from the IETF Announcement list, send a message to 
ietf-announce-request with the word unsubscribe in the body of the message.

Internet-Drafts are also available by anonymous FTP. Login with the username
"anonymous" and a password of your e-mail address. After logging in,
type "cd internet-drafts" and then
	"get draft-ietf-rddp-sctp-00.txt".

A list of Internet-Drafts directories can be found in
http://www.ietf.org/shadow.html 
or ftp://ftp.ietf.org/ietf/1shadow-sites.txt

Internet-Drafts can also be obtained by e-mail.

Send a message to:
	mailserv <at> ietf.org.
In the body type:
	"FILE /internet-drafts/draft-ietf-rddp-sctp-00.txt".
	
NOTE:	The mail server at ietf.org can return the document in
	MIME-encoded form by using the "mpack" utility.  To use this
	feature, insert the command "ENCODING mime" before the "FILE"
	command.  To decode the response(s), you will need "munpack" or
	a MIME-compliant mail reader.  Different MIME-compliant mail readers
	exhibit different behavior, especially when dealing with
	"multipart" MIME messages (i.e. documents which have been split
	up into multiple messages), so check your local documentation on
	how to manipulate these messages.
		
		
Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.
Attachment: message/external-body, 133 bytes
Attachment (draft-ietf-rddp-sctp-00.txt): message/external-body, 68 bytes
Black_David | 29 Sep 2003 21:12

Time to get to work

Now that I've done my job in getting the problem statement
and architecture documents to the ADs for publication, it's
high time for the WG to get back to work from summer
vacation ...

... I've seen very little list traffic on technical issues
over the past month or so, and if I were to judge solely
by that, I'd have a hard time justifying meeting time
in Minneapolis.  Now, I know better - we have serious open
technical issues, and hence I'd like to see some comments/
proposals/discussion on at least the following:

(1) Connection startup.  After all the discussion about
	minimization of round trips, I was shocked to see
	a 3 round-trip startup sequence presented in Vienna.
	The MPA draft authors are responsible for getting
	this down to something more reasonable, and comments
	on the list about how it should be done are welcome.
(2) Invalidation.  We have a lingering issue from way back
	when that RDMAP is handling invalidation of STags,
	which are DDP level resources.  This needs some
	serious examination in this forum.
(3) Security.  The new version of the security draft needs
	to be submitted ASAP - please hold off discussion of
	security until it shows up, but getting the main DDP
	and RDMAP protocols done probably requires obtaining
	WG rough consensus and some level of IETF review
	of the security draft outside the WG.  The shared
	receive resources issue that generated controversy
	in Vienna will need to be resolved to a good degree
	as a requirements discussion around the security
	draft - the specific issue there will be what level
	of protection of shared receiver resources from
	sender misbehavior is required under what
	circumstances and why.

Knowing this group, one invitation to start discussion
should suffice ...

Thanks,
--David (RDDP WG chair)

----------------------------------------------------
David L. Black, Senior Technologist
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953             FAX: +1 (508) 293-7786
black_david <at> emc.com        Mobile: +1 (978) 394-7754
----------------------------------------------------
Caitlin Bestler | 30 Sep 2003 16:36

Connection Startup


On Monday, September 29, 2003, at 02:12 PM, Black_David <at> emc.com wrote:

> Now that I've done my job in getting the problem statement
> and architecture documents to the ADs for publication, it's
> high time for the WG to get back to work from summer
> vacation ...
>
> ... I've seen very little list traffic on technical issues
> over the past month or so, and if I were to judge solely
> by that, I'd have a hard time justifying meeting time
> in Minneapolis.  Now, I know better - we have serious open
> technical issues, and hence I'd like to see some comments/
> proposals/discussion on at least the following:
>
> (1) Connection startup.  After all the discussion about
> 	minimization of round trips, I was shocked to see
> 	a 3 round-trip startup sequence presented in Vienna.
> 	The MPA draft authors are responsible for getting
> 	this down to something more reasonable, and comments
> 	on the list about how it should be done are welcome.

With the introduction of the start frame, there is no
need for extra delays during connection setup. What is
proposed works fine, the only adjustment required is
to remove some of the ordering rules. They are not
necessary and can increase the number of round trips.

A quick review:

Starting MPA mode has always been easy when both end
naturally know the point in the TCP stream where MPA
should begin.

This is easily accomplished when that start point is
zero (i.e., the stream is placed in MPA mode immediately)
or at the conclusion of an existing stream-mode negotiation
(as is proposed for iSER).

The problem case that was raised earlier was when the
need for negotiation is RDMA specific. Specifically,
some private data must be exchanged between the ULP
peers in order to ensure that the RDMA Endpoints are
selected/configured compatibly.

Earlier I had proposed including Private Data within
the MPA Start Frame to accomplish this. The MPA draft
authors counter-suggested an optional, but standard,
negotiation frame that could proceed the MPA Start
frame. Implicitly, the motivation for separating the
negotiation frame from the start frame was to allow
for the negotiation frame to be handled by drivers
and/or middleware rather than by the hardware/firmware.

While I still think the single Start Frame with
variable length Private Data is a better solution,
the optional negotiation frame is an acceptable
solution once unnecessary ordering rules are removed.

The key problem of *not* having a standard method
to exchange Private Data is that without one you cannot
exchnage variable length Private Data without doing a
very tricky round-trip up to the ULP immediately before
entering MPA mode.

Consider the case where there is a need to send a variable
length message immediately before the MPA Start Frame.
Without a standard format, the ULP must consume and
parse those bytes. And *then* place the stream in MPA
receive mode.

That either calls for a byte-by-byte fetch from the stream
and/or a putback capability. Either is contrary to the goal
of optimizing performance on non-MPA TCP streams.

A standardized negotiation frame allows the ULP peers to
place their streams in MPA mode (and bind them with RDMA
endpoints) before the variable length private data message.

This works well under all of the following scenarios:

Instant Startup: ULP peers have pre-designated the connection
to be in MPA mode. They each send an MPA Start Frame immediately
to conform this.

Post-streaming-mode Startup: After conducting streaming mode
negotiations over the TCP connections, the ULP peers have reached
the same state as an Instant Startup. They can each send MPA
Start frames immediately. The end that completed the negotiation
can even place its MPA Start Frame in the same TCP segment as
its final streaming mode response.

IB CM-compatible startup:
	Active side sends a Negotiation Frame with its private data.
	It also pre-designates the RDMA endpoint to be used.

	Passive side consumes the Negotiation Frame and passes it to
    the ULP as an event.

	Passive side uses the private data to select/configure its
	endpoint. It responds by sending its responding negotiation
    frame, associating its endpoint and sending its MPA Start
	Frame. The MPA Start Frame SHOULD be in the same TCP segment
    as the Negotiation Frame when possible.

    Active Side consumes Negotiation Frame, and immediately places
    the TCP connection in MPA mode (sending its MPA Start Frame
    and consuming the one sent to it). The previously designated
	RDMA endpoint is associated, and will handle all DDP Segments
    that are after the MPA Start Frame. The peer MPA Start frame
    can be consumer immediately. The private data from the response
	negotiation frame is reported to the ULP.

The simplified rules therefore are:

- An implementation MUST NOT send the MPA Start Frame until
   it is ready to process FULPDUs starting at the agreed
   point in the TCP stream.
- An implementation MUST NOT send FULPDUs prior to receiving
   an MPA Start Frame from its peer.

	These two rules ensure that there is an RDMA Endpoint
	associated with the stream before any DDP Segments are sent.

- An implementation MUST NOT send FULPDUs earlier in its
   output TCP Stream than the MPA Start Frame.
- An implementation MAY send zero or more Negotiation Frames
   prior to the MPA Start Frame.
- If an implementation does not receive the exepected
   MPA Start Frame at the agreed point in the TCP Stream
   it MUST terminate the connection.
Caitlin Bestler | 30 Sep 2003 17:14

Re: Invalidation


On Monday, September 29, 2003, at 02:12 PM, Black_David <at> emc.com wrote:

> (2) Invalidation.  We have a lingering issue from way back
> 	when that RDMAP is handling invalidation of STags,
> 	which are DDP level resources.  This needs some
> 	serious examination in this forum.

I'm not sure I remember anything that qualifies as an
"issue". There are certainly some quirks that relate to
the fact that the invalidation is inherently asynchronous
because it is processed on the "completion track" and at
least theoretically by a different layer (and hence potentially
as well).

Quirk #1: Invalidation is not a fence. "Later" DDP Segments
may be placed prior to the invalidation -- even if received
in order. Invalidation offers no guarantees to the sender,
only to the receiver. And the receiver is only guaranteed
that there will be no further placements *when the notice
of the invalidation is delivered*. The only guarantee about
what was placed at that point is that *earlier* DDP segments
will have been placed. There is no guarantee that *later*
DDP segments were *not* placed.

Quirk #2: Invalidation may or may not disrupt an in-process
RDMA Read. That is, there is no specification on *when* an
RDMA Read engine translates STags.

The last issue is that the current wording does not allow
a local implementation to suppress remote invalidation on
specific tags. This essentially prevents exporting STags
that are associated with permanent data structures such
as Memory Regions (as opposed to windows). That should be
fixed.

In general, remote invalidation should not be viewed as
providing any guarantees to the remote side. It is a
service to the local ULP. While it ought to remain
mandatory to implement, the local ULP should have
the option of not using it for a specific STag.

Gmane