Jim Pinkerton | 6 Dec 2003 02:33
Picon
Favicon

RE: DDP 7.3 - Data Source MUST transmit DDP Messages in order


I agree that state explosion on the receiver is a big concern. 
It's really a question of requiring a scoreboarding receiver verses a
single counter. Pretty radically different complexity, when you throw in
that a message can be 2 gigabytes in length and segmented on any
arbitrary boundary. Look at SACK, for example - they understood that
state explosion is a huge deal, and at some point you have to punt.
Their form of punting was capping the number and then allowing even that
amount to be discarded if things really hit the fan. Note I'm _not_
suggesting that we implement SACK style limits on out-of-order
processing. If we were to change, I'd personally prefer going the other
direction - require in-order DDP segments on an inorder TCP stream.

I think the issue you have is section 7.3, with the SHOULD verses having
a MUST, specifically:

At the Data Source, DDP ...
*	SHOULD transmit DDP Segments within a DDP Message in increasing
MO order for Untagged DDP Messages and in increasing TO order for Tagged
DDP Messages. 

So hopefully the SHOULD means that the normal/fast path is in-order DDP
Segments within an in-order TCP stream, but it also means that the
receiver must be able to handle the out-of-order DDP segments within a
DDP message on an in-order TCP stream case. So it doesn't get you out of
the state explosion - it just means that you don't have to necessarily
handle it in the fast path.

Jim

(Continue reading)

Michael Krause | 9 Dec 2003 00:22
Picon

Security Draft Comments


Overall, the security draft is coming together with no real major issues to 
resolve from what I've reviewed to date thus we should be able to get it 
cleaned up in a relatively short period of time.

Mike

(1) Draft might benefit from a section that describes current RDMA / DDP 
techniques that limit exposures.  For example, a description of the basic 
error semantics and how these lead to hard errors and connection 
shutdown.  While some of this is implied or spread out to varying extents 
within the draft specification, consolidation into a single section would 
simplify the subsequent discussion as well as add clarity.  Applicable 
error conditions to document: protection violation, STag violation, base / 
bounds violation, etc.  In addition, a brief description of window vs. 
region and how the fine grain exposure of a window has benefits to the ULP 
in constraining potential memory to target.

(2) Unclear if there is a need for a glossary but some specification terms 
are repeated here while others are not.  May be simpler to just reference 
the specs with the agreed to terminology and definitions rather than 
redefine in potentially imprecise terms.

(3) Figure 1.  One might consider having a line between the admin and the 
privileged application given the resource manager is not required but its 
essence may be integrated into the privileged application.  For example, a 
storage subsystem may already have defined interfaces to administration 
services and if it has exclusive control of the RNIC would not require a 
separate resource manager.

(Continue reading)

Talpey, Thomas | 21 Dec 2003 16:14
Picon

NFS/RDMA ONC RPC reference implementation release

Network Appliance is pleased to announce general availability of a
client implementation of NFSv3 over RDMA for Linux kernels. This
client, with full source code, is available for download from

        http://sourceforge.net/projects/nfs-rdma/

The client is implemented entirely within the Linux ONC RPC code,
and requires no changes to the NFS layer. It is a relatively complete
implementation of the draft ONC RPC RDMA protocols published
earlier this year.

NetApp is publishing this implementation in order to encourage interest
and support of this protocol, and to enable the future use of NFSv4 over
RDMA. The Linux v3 client will be tested at the upcoming NFS
Connectathon in February.

The project description with references is included below. We look forward
to discussion and experimentation with the new protocol in the months to
come. Please see the above Sourceforge project for updates.

Happy New Year!

Tom.

-----

Project Description:

http://sourceforge.net/projects/nfs-rdma/

The NFS/RDMA project is a reference implementation of a new RDMA-capable
ONC RPC transport for use by Linux kernel NFS.

Client support for NFSv3 is the initial goal. This transport implementation is
available under a BSD-style license.

Through use of an RDMA transport, a very high level of performance may be
achieved, including zero-copy and zero-touch direct I/O. No modifications to
the Linux NFSv3 client code are required.

The protocol is conformant to that published in the following Internet Drafts:
        draft-callaghan-rpcrdma-00.txt
        draft-callaghan-nfsdirect-00.txt
        draft-talpey-nfs-rdma-problem-statement-00.txt

and the implementation uses the kDAPL RDMA API
        http://sourceforge.net/projects/dapl/

The kDAPL API supports Infiniband and iWARP RDMA transports.

Followon goals include server support, NFSv4 support, and additional transport
research.

The Linux NFS kernel server code is not addressed in the initial release. Portions
of the transport implementation may provide the base for future server work.

NFSv4 support may incorporate protocol modifications from the Internet Draft:
        draft-talpey-nfsv4-rdma-sessions-00.txt

An RDMA-enabled NFSv4 implementation is a natural high performance
filesystem for clustered or workgroup computing, due to the enhanced file
sharing capabilities of NFSv4,  such as its integrated locking, delegations,
and security.

The transport interface framework created as part of this project is general,
and may be applied to other efforts. Support for RPC transports employing
TCP Offload Engines, or hybrid approaches using Fibre Channel/iSCSI in
conjunction with NFS will also be considered.

Kanevsky, Arkady | 23 Dec 2003 00:16
Picon

Proposal to augment MPA connection set up frame

I have a comment on MPA draft.
It now defines very nicely an
Initiator/Responder sequence.
But it looks deficient for a broad set of ULPs.
Most of the client-server ULP uses two type
of Responses: "accept" and "reject".
The MPA draft does not provide a way
to differentiate between these 2 responses.

Here is one example of an application use.
One of the common way the private data is used
is to negotiate RDMA Read credits for
a connection to be used by RDDP.
An Initiator passes in private data
a requested RDMA credits.
Responder either accepts a connection or
rejects a request specifying how many
RDMA Read credits it can support.

Currently the Responder has 2 legal choices.
One it can terminate a connection.
This does not convey any information to
the Initiator.

The second choice is to generate Responder
Frame even though RDMA Read credits
requested can not be satisfied.
This will establish connection but it can not
be used as intended by the Initiator.
Initiator can either tear down the established
stream mode connection or use connection
with Responder RDMA credits supported.
This is de-facto ULP protocol since Responder
Frame will include the RDMA Read credits
it can support. On top of that both sides
have to agree on the action they will take.

While ULP can use a protocol over private data
to differentiate between accept and reject,
given a commonality of this semantic for ULPs
we can use one bit from the Reserved area of
MPA frame for the Responder frame to differentiate
between accept and reject responses.

Proposal:
for Responder MPA frame the 1st Reserved bit
to be used as accept/reject bit. 0 for accept,
1 for reject.
The bit will be called Type (T) in figure 7.

Arkady Kanevsky                                email: arkady <at> netapp.com
Network Appliance                              phone: 781-768-5395
375 Totten Pond Rd.                            Fax: 781-895-1195
3rd Floor                                      http: www.netapp.com
Waltham, MA 02451-2010                         general phone:
781-768-5300

Gmane