Benny Halevy | 17 Jun 2013 12:27

Re: status and next steps for RFC5664bis (DRAFT FOR REVIEW)

Brent, I'm good with this draft.

Benny

On Sat, Jun 8, 2013 at 2:17 AM, Welch, Brent <welch <at> panasas.com> wrote:

Attached is a drafted dated June 7.

 

I remember that Dave Noveck sent comments, but I can’t find them – sorry.  If you can dig those up we’ll address them.

 

Boaz, has a comment about the

  netaddr4 ota_netaddr

that is used for device discovery.  This type is a union of two fields, r_netid and r_addr.  The r_addr is formatted as a URI, and the other field is formatted as IP:PORT.  You want the spec to say that we’ve been using the URI form in the r_addr branch of the netaddr4 union, correct?  Something like the following added to section 4.2.2 Device Network Address:

 

<quote>

The netaddr4 type has r_netid and r_addr branches in its union.  The server SHOULD use the r_addr field formatted as a URI that specifies the location of the OSD.  This is compatible with the iSCSI auto-login infrastructure found in many client systems.

</quote>

 

Do we want to add the netaddr4  type as <artwork>?  Can you give a better example for the URI?

 

--

Brent

 

From: Noveck, David [mailto:david.noveck <at> emc.com]
Sent: Wednesday, May 15, 2013 9:48 AM
To: Spencer Shepler; bhalevy <at> tonian.com; Harrosh, Boaz; Welch, Brent
Cc: nfsv4 <at> ietf.org
Subject: RE: status and next steps for RFC5664bis

 

IIRC, there was some discussion of moving this to last-call and I was asked to provide a review.  I could dig that up if anybody is interested.  The last I heard was that my comments would be responded to, but I’ve heard nothing since then.

 

My understanding is that there is no interest  in continuing this and those who were interested in this are now interested in doing what I call the son-of-pNFS-object work instead.   Can people who really know tell me whether I’m right?

 

It appears that the administrative/charter side of things has not kept pace with the technical  interests of working group participants.    Clearly that’s an undesirable situation, but I have no idea how it would be best rectified.

 

Assuming I’m right, it would be a shame if the following sort of argument prevailed:

·         How can we discuss  yet a new mapping type when we have this 56664bis stuff hanging fire?

·         If there is no energy to do the latter, how can the working group do the former?

 

The relevant points are that sometimes people make mistakes and also that circumstances change and make previous decisions, even those made by rough consensus, incorrect in retrospect.

 

 

 

 

 

 

 

From: nfsv4-bounces <at> ietf.org [mailto:nfsv4-bounces <at> ietf.org] On Behalf Of Spencer Shepler
Sent: Wednesday, May 15, 2013 1:46 AM
To: bhalevy <at> tonian.com; Boaz Harrosh (bharrosh <at> panasas.com); welch <at> panasas.com
Cc: nfsv4 <at> ietf.org
Subject: [nfsv4] status and next steps for RFC5664bis

 

 

Hello,

 

The following bis document expired 8 months ago and is part of the working group’s charter.

http://tools.ietf.org/id/draft-ietf-nfsv4-rfc5664bis-01.txt

 

Please provide a timeline for updates or an opinion that the work should be removed from the charter and the WG’s milestones.

 

Thanks,

Spencer

 


_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4
RFC Errata System | 15 Jun 2013 15:18
Favicon

[Editorial Errata Reported] RFC5661 (3653)

The following errata report has been submitted for RFC5661,
"Network File System (NFS) Version 4 Minor Version 1 Protocol".

--------------------------------------
You may review the report below and at:
http://www.rfc-editor.org/errata_search.php?rfc=5661&eid=3653

--------------------------------------
Type: Editorial
Reported by: Kanda Motohiro <kanda.motohiro <at> gmail.com>

Section: 13.4.2

Original Text
-------------
The destinations of the first 13 storage units are:

Corrected Text
--------------
The destinations of the first 13 stripe units are:

Notes
-----
Same errata on Section 13.4.3

Instructions:
-------------
This errata is currently posted as "Reported". If necessary, please
use "Reply All" to discuss whether it should be verified or
rejected. When a decision is reached, the verifying party (IESG)
can log in to change the status and edit the report, if necessary. 

--------------------------------------
RFC5661 (draft-ietf-nfsv4-minorversion1-29)
--------------------------------------
Title               : Network File System (NFS) Version 4 Minor Version 1 Protocol
Publication Date    : January 2010
Author(s)           : S. Shepler, Ed., M. Eisler, Ed., D. Noveck, Ed.
Category            : PROPOSED STANDARD
Source              : Network File System Version 4
Area                : Transport
Stream              : IETF
Verifying Party     : IESG
_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4

Chuck Lever | 3 Jun 2013 21:53
Picon
Favicon

FedFS future work

I promised to kindle a conversation here about the FedFS presentation on the agenda for IETF 87.  There are
two outstanding issues which I will deal with in reply to older threads.  Here I want to introduce some
futures topics to consider.

  o  Multi-domain authentication (in progress)

  o  SMB support
     -  NSDB schema updates (in progress)
     -  Can (Samba) DFS and FedFS junctions co-exist?
     -  What would a common administrative interface look like,
        and how would our ADMIN protocol enable it?

  o  Pre-populating /nfs4 without the AFS "ls -l /afs" problem

  o  Smarter use of LDAP
     -  Support for SASL/GSS-API for NSDB access
     -  Use an LDAP search instead of namingContext updates
        to find NCEs
     -  Using LDAP referrals and other replication mechanisms

  o  Compromised NSDBs
     -  Integrating certificate revocation support
     -  ADMIN support for fencing NSDBs

  o  NSDB discovery for admin tools

  o  Ensuring servers hand out reachable locations
     -  FSN may have an exhaustive list of locations, but not all
        locations may be reachable by every client
     -  IPv6 link-local or other unroutable addresses
     -  DMZ considerations
     -  How to represent a multi-homed server with FSLs

  o  New ADMIN operations
     -  Support for listing all NSDBs known to a fileserver
     -  Support for removing NSDB connection parameters
     -  Fleshing out FEDFS_CREATE_REPLICATION and friends

Given how few comments Andy has received on the multi-domain effort, the immediate question is how much
appetite do folks have for embarking on a next-generation NSDB and ADMIN protocol?

--

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4

Rick Macklem | 3 Jun 2013 13:45
Picon
Favicon

Re: Question on Delegation CB_RECALL and DELEGRETURN

Arnab Bakshi wrote:
> Thanks Rick.
> 
> 
> As per my understanding the client may incorrectly close a TCP
> connection gracefully without returning Delegation (this behavior is
> unlikely but possible), and RFC doesn't prevent it to be so. Servers
> should be able to handle this scenario by internally Revoking the
> granted delegations as soon as the TCP connection is down.
> 
A client can close/create new TCP connections whenever they please.
It has no effect on what the server does for NFSv4.0. For NFSv4.1,
I'm not sure if a server can discard a session once the connection(s)
bound to it are all closed or not? (I've just done the basics of a server
and haven't yet thought throw exactly when a session can be discarded.
At this point my server only discards a session when it gets a
DESTROY_SESSION from the client.) I think it can still bind another
connection to the session. Maybe someone else can answer this.

As for the second part, I believe it is up to the server implementor
as to whether you choose to eventually revoke the delegation when a
client fails to DELEGRETURN it in a reasonable time (at least a lease
duration) or only if the lease expires (a lease can be renewed by
other operations, so waiting a lease duration doesn't imply expiry of
the lease).

> 
> For failover/resiliency case maybe the TCP connection is not shutdown
> gracefully and it resumes after the outage is recovered. Tying the
> state to TCP connection would have worked unless maybe required some
> handling for failover scenarios.
> 
As far as I know, the only rule is that a client must create a new TCP
connection to retry a compound RPC. (Retry means "sending the same
RPC with same xid in header, etc". A client will normally make another
attempt to perform a compound after receiving NFS4ERR_DELAY, but it
will use a fresh RPC header with a different xid, so this isn't a retry
for the above rule.)

> 
> 
> I am sorry but I did not get the argument part. Is it the FreeBSD code
> you are talking about or the concept of revoking after providing
> sufficiently long timeout..
> 
Just ignore my comment w.r.t. tying state to connections. It isn't a
part of the protocol.

rick

> 
> Regards
> Arnab
> 
> 
> 
> On Sat, Jun 1, 2013 at 5:14 PM, Rick Macklem < rmacklem <at> uoguelph.ca >
> wrote:
> 
> 
> 
> 
> Arnab Bakshi wrote:
> > Thanks Rick..my comments inline...
> >
> > On Jun 1, 2013 5:03 AM, "Rick Macklem" < rmacklem <at> uoguelph.ca >
> > wrote:
> > >
> > > Arnab Bakshi wrote:
> > > > Hi,
> > > >
> > > >
> > > > Was going through the RFC5661 and particularly focusing on the
> > > > Delegation part. I have couple of questions which I would like
> > > > to
> > > > put
> > > > here:
> > > >
> > > >
> > > > Scenario: Suppose a NFS client (client A) has been granted
> > > > Read/Write
> > > > delegation by a NFS server for a particular file. Now after
> > > > sometime
> > > > another client (client B) requests conflicting Delegation on the
> > > > same
> > > > file. Ideally in this case the server will send a CB_RECALL to
> > > > client-A to get back the delegation. Suppose the client
> > > > behaviour
> > > > is
> > > > that it sends a CB_OK response and does actual IO to flush its
> > > > state
> > > > to the server.
> > > >
> > > >
> > > > Question1: Client-A keeps sending IO which say is suppose very
> > > > big
> > > > and
> > > > time consuming therefore delaying DELEGRETURN. What steps does
> > > > the
> > > > server take? Will it try to revoke the delegation forcefully
> > > > after
> > > > some elapsed time and how?.
> > > >
> > > Well, I'm not sure there is a well defined correct answer for
> > > these
> > > questions, but here is what the FreeBSD server does.
> > >
> > > Since Client-A keeps sending IO requests, it is not
> > > crashed/network
> > > partitioned and, as such, the FreeBSD server will return
> > > NFS4ERR_DELAY
> > > to conflicting Open requests until Client-A does DELEGRETURN. (A
> > > client
> > > that never sends a DELEGRETURN is broken and since there are only
> > > 3-4
> > > clients out there, I don't believe any of them are broken this
> > > way?)
> > >
> > The implementation makes sense. To put it more specifically if
> > client
> > closes the transport connection before it sends DELEGRETURN then
> > hopefully server clears the state for the client-id and the
> > associated
> > state-id,is it so...? The clients available are surely not broken
> > but
> > I am interested more from protocol definition perpective. ?.
> 
> Nope. As far as I know, there is nothing in the RFCs that states a
> client
> can't do a fresh connection whenever it wants to. (A client is
> required to
> use a new connection whenever it is retrying an RPC after it fails to
> receive a reply to it.)
> 
> I once suggested tying state to TCP connections, but others didn't
> like
> the idea at the time, as I recall.
> 
> I can see an argument for eventually revoking the delegation, even if
> the client is sending other RPCs to the server but no DELEGRETURN. To
> be honest, I can't remember if I did that for the FreeBSD server or
> not.
> (I'd have to look at the code. Just because I wrote it doesn't mean I
> can remember what it does.;-)
> 
> rick
> 
> 
> 
> > > >
> > > > Question2: Client-A sends IO for sometime and then somehow
> > > > becomes
> > > > idle and doesn't send the DELEGRETURN. What does the server do
> > > > in
> > > > this
> > > > case, will it revoke and how?
> > > >
> > > If Client-A has crashed or is network partitioned, the lease will
> > > expire
> > > and after that happens the FreeBSD throws away all state
> > > (including
> > > all
> > > delegations) for that client and returns NFS4ERR_EXPIRED for the
> > > clientid
> > > and all associated stateids. After the delegation has expired, the
> > > conflicting
> > > Open succeeds. (The conflicting Open gets NFS4ERR_DELAY for all
> > > attempts
> > > until lease expiry.)
> > >
> > > I haven't looked at WANT_DELEGATION yet, so I am not sure if I
> > > will
> > > handle
> > > that like a conflicting Open or simply return something like the
> > > extended
> > > delegate none with resource as the reason for not issuing the
> > > delegation.
> > >
> > > rick
> > >
> > > >
> > > > I know this is a negative scenario but would be helpful to know
> > > > what
> > > > should be the behavior from the server side.
> > > >
> > > >
> > > > Thanks in advance...
> > > >
> > > >
> > > > Regards
> > > > Arnab
> > > > _______________________________________________
> > > > nfsv4 mailing list
> > > > nfsv4 <at> ietf.org
> > > > https://www.ietf.org/mailman/listinfo/nfsv4
> >
> > _______________________________________________
> > nfsv4 mailing list
> > nfsv4 <at> ietf.org
> > https://www.ietf.org/mailman/listinfo/nfsv4
> 
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4 <at> ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4
_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4

Arnab Bakshi | 31 May 2013 12:11
Picon

Question on Delegation CB_RECALL and DELEGRETURN

Hi,

   Was going through the RFC5661 and particularly focusing on the Delegation part. I have couple of questions which I would like to put here:

Scenario:     Suppose a NFS client (client A) has been granted Read/Write delegation by a NFS server for a particular file. Now after sometime another client (client B) requests conflicting Delegation on the same file. Ideally in this case the server will send a CB_RECALL to client-A to get back the delegation. Suppose the client behaviour is that it sends a CB_OK response and does actual IO to flush its state to the server. 

Question1:     Client-A keeps sending IO which say is suppose very big and time consuming therefore delaying DELEGRETURN. What steps does the server take? Will it try to revoke the delegation forcefully after some elapsed time and how?.

Question2:     Client-A sends IO for sometime and then somehow becomes idle and doesn't send the DELEGRETURN. What does the server do in this case, will it revoke and how?

I know this is a negative scenario but would be helpful to know what should be the behavior from the server side. 

Thanks in advance...

Regards
Arnab
_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4
Spencer Shepler | 29 May 2013 17:46
Picon
Favicon

closing on the IETF87 NFSv4 WG meeting (review and comment)

 

See below for the draft agenda at this time.  Please let me know what I have missed or needs to be adjusted…

 

We have a number of topics it seems.  It also appears the content will be slides given the lack of backing I-Ds at this point.

 

Let’s close by the end of today such that we can move forward with a meeting request if we need one.

 

Spencer

 

 

 

 

NFSv4 Working Group

Berlin

July 28, 2013

 

 

Note Well

 

Agenda Bashing

 

Flexible File Layouts (Halevy) (15-20 minutes)

 

FedFS issues (Lever) (15 minutes)

      - this is a placeholder; expecting additional detail for this item

 

pNFS Lustre layout (Faibish)

 

Coherent data caching for NFS - (Eshel) (20 minutes)

 

RPCSECGSSv3 - (williams - uncommittted)

 

End-to-end Integrity - (Lever) (time ?)

 

Versioning Model - (Noveck) (time ?)

 

 

 

_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4
Martin Stiemerling | 28 May 2013 17:33
Picon
Favicon

AD question about i18n in draft-ietf-nfsv4-rfc3530bis

Hi all,

This question came up in the IESG review about Section 12
of draft-ietf-nfsv4-rfc3530bis about i18n:

Is any implementation implementing what's in Section 12 on 
internationalization?

Thanks in advance,

   Martin

--

-- 
IETF Transport Area Director

martin.stiemerling <at> neclab.eu

NEC Laboratories Europe
NEC Europe Limited
Registered Office:
Athene, Odyssey Business Park, West End  Road, London, HA4 6QE, GB
Registered in England 2832014
_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4

Frank S Filz | 23 May 2013 23:05
Picon
Favicon

Question about compound operation 2

RFC 5661 has the following text:

 Note that operations zero and one are not defined for the COMPOUND
procedure. Operation 2 is not defined and is reserved for future
definition and use with minor versioning. If the server receives an
operation array that contains operation 2 and the minorversion field
has a value of zero, an error of NFS4ERR_OP_ILLEGAL, as described in
the next paragraph, is returned to the client. If an operation array
contains an operation 2 and the minorversion field is non-zero and
the server does not support the minor version, the server returns an
error of NFS4ERR_MINOR_VERS_MISMATCH. Therefore, the
NFS4ERR_MINOR_VERS_MISMATCH error takes precedence over all other
errors.

But RFC 5661 doesn't define an operation 2.

Is operation 2 intended to be a sort of "minor version ping"? If so, I would assume it has no parameters.

Thanks

Frank

_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4
Frank S Filz | 21 May 2013 20:45
Picon
Favicon

Some language I think needs clarification in RFC 3530 bis

The following section:

9.4. Blocking Locks
Some clients require the support of blocking locks. The NFS version
4 protocol must not rely on a callback mechanism and therefore is
unable to notify a client when a previously denied lock has been
granted. Clients have no choice but to continually poll for the
lock. This presents a fairness problem. Two new lock types are
added, READW and WRITEW, and are used to indicate to the server that
the client is requesting a blocking lock. The server should maintain
an ordered list of pending blocking locks. When the conflicting lock
is released, the server may wait the lease period for the first
waiting client to re-request the lock. After the lease period
expires the next waiting client request is allowed the lock. Clients
are required to poll at an interval sufficiently small that it is
likely to acquire the lock in a timely manner. The server is not
required to maintain a list of pending blocked locks as it is used to
increase fairness and not correct operation. Because of the
unordered nature of crash recovery, storing of lock state to stable
storage would be required to guarantee ordered granting of blocking
locks.

Contains the statement:

The server should maintain an ordered list of pending blocking locks.

And then this statement:

The server is not required to maintain a list of pending blocked locks as it is used to increase fairness and not correct operation.

Could we change the should in that first statement to may to not be confused with the RFC meaning of SHOULD?

Or is that second statement not really right? Or is the first statement correctly a SHOULD and the second statement is giving the "out" for a server that doesn't want to follow a SHOULD (since it isn't a MUST)?

Thanks

Frank

_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4
Matt W. Benjamin | 20 May 2013 21:12

Re: A more coherent data cache for NFS4.3 clients

Hi,

In combination with some type of data versioning, it is possible to get better results.  I believe this
concept exists in DFS as well as, of course, in AFS.

In addition to the 3 cases Marc mentions, the default case in the (extended version of the) AFS caching model
(called "extended callbacks") that I worked on, is a stronger version of case 2, in which any client is a
potential writer.  I think you get the same semantics as in Marc's case 2, if the presence of readers forces
write synchrony (direct mode, I guess) on the writers.

Data versioning becomes more complex in the presence of pNFS, but there are obvious ways to unify them.

Matt

----- "Trond Myklebust" <Trond.Myklebust <at> netapp.com> wrote:

> On Mon, 2013-05-20 at 10:33 -0700, Marc Eshel wrote:
> > The objective of this feature is to improve NFS client caching for
> > NFS4.3 
> > I would like to find out from the group if one, others see this as
> a
> > useful 
> > addition to NFS, and two, if extending the delegations feature is
> the
> > right 
> > approach. 
> > 
> > The requirements as I see them is to make NFS client caching
> possible
> > while 
> > maintaining its coherency without changing the NFS client
> > applications. Have 
> > better synchronization of data among applications on different NFS
> > clients 
> > while caching data. 
> > 
> > The options that we have today for more synchronization of data
> among
> > NFS 
> > clients is direct IO which means no data caching. With delegations
> we
> > can have 
> > caching on multiple NFS client readers or a single NFS client
> writer. 
> >   
> > It would be nice to make better use of delegations and more
> > specifically the 
> > combination of read write delegation. In general the just read
> > delegations are 
> > useful but the combination with write delegation are less useful
> for
> > most cases. 
> > 
> > Selecting delegation as the synchronization method and not byte
> range
> > locks 
> > have few reasons. The first objective is to not block applications
> > from read or 
> > write but to just synchronize the updates, this extension need to
> > coexist with 
> > current delegation and should be considered as a more refined
> subset
> > of the 
> > current  delegations, and finally because we don't want to requiter
> > any changes 
> > to the application. 
> > 
> > One problem with delegation is that it is server and not client
> > initiated, so we 
> > might need to add some mount options or hints for the applications
> to
> > request 
> > this new behavior, or adding the option for the NFS client to
> request
> > a byte range 
> > delegation. 
> 
> Aside from the cases where O_DIRECT and locking are active, when
> would
> the client not want to request a delegation for something that it is
> reading?
> 
> > Delegation will be extended to include a byte range and delegation
> > callback 
> > will have to specify the part of the file that were changed by a
> > writer or being 
> > requested by  a reader. 
> > 
> > It might be required that the byte range be on block boundary. The
> > byte range 
> > might start with full file delegations, but it might be reduced to
> the
> > range that 
> > represent the cached portion of the file. We will have to put some
> > limit to the 
> > number of byte ranges to keep resources usage under control. 
> > 
> > The applications that can benefit from this feature are HPC
> > application, multi-node 
> > database programs, web server cluster, etc. 
> > 
> > 
> > case 1: 
> > Lets consider the case of multiple readers and one writer. In this
> > multi-reader-one-writer 
> > mode all clients will keep both the delegations and the cache and
> will
> > start with a full 
> > file byte range. 
> > When a reader makes a read of some byte range from the server the
> > server will make a 
> > callback to the writer, if it is holding a delegation for the
> > requested read byte range, 
> > requesting that this range be flushed to the server, the writer
> will
> >  flush the changed 
> > byte range if the range is dirty before the server can respond to
> the
> > reader. 
> > The writer might also consider give up that byte range to avoid
> more
> > call backs. 
> > 
> > case 2: 
> > Again all client in this multi-reader-one-writer mode will maintain
> > their caches. 
> > When the writer writes to the server all readers will be notified
> of
> > the changed byte range. 
> > At this point the reader can purge that range from its cache if it
> was
> > there in the first place 
> > or read that range from the server to keep its cache up to date. It
> > can also give up the 
> > delegation if it is not interest in caching this range. 
> > 
> > case 3: 
> > There are HPC application that split to file to segments that each
> > node is working on. 
> > It will be nice to give each client a write delegation for the
> portion
> > of the file that is will 
> > update, a good hint would be when the client punches a hole. 
> 
> ???? Why is that a hint?
> 
> Either way, please see
> 
> https://datatracker.ietf.org/doc/draft-myklebust-nfsv4-byte-range-delegations/
> 
> for a draft implementation of the subfile delegation concept.
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer
> 
> NetApp
> Trond.Myklebust <at> netapp.com
> www.netapp.com
> _______________________________________________
> nfsv4 mailing list
> nfsv4 <at> ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4

--

-- 
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel.  734-761-4689 
fax.  734-769-8938 
cel.  734-216-5309 
_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4

Marc Eshel | 20 May 2013 19:33
Picon
Favicon

A more coherent data cache for NFS4.3 clients

The objective of this feature is to improve NFS client caching for NFS4.3
I would like to find out from the group if one, others see this as a useful
addition to NFS, and two, if extending the delegations feature is the right
approach.

The requirements as I see them is to make NFS client caching possible while
maintaining its coherency without changing the NFS client applications. Have
better synchronization of data among applications on different NFS clients
while caching data.

The options that we have today for more synchronization of data among NFS
clients is direct IO which means no data caching. With delegations we can have
caching on multiple NFS client readers or a single NFS client writer.
 
It would be nice to make better use of delegations and more specifically the
combination of read write delegation. In general the just read delegations are
useful but the combination with write delegation are less useful for most cases.

Selecting delegation as the synchronization method and not byte range locks
have few reasons. The first objective is to not block applications from read or
write but to just synchronize the updates, this extension need to coexist with
current delegation and should be considered as a more refined subset of the
current  delegations, and finally because we don't want to requiter any changes
to the application.

One problem with delegation is that it is server and not client initiated, so we
might need to add some mount options or hints for the applications to request
this new behavior, or adding the option for the NFS client to request a byte range
delegation.

Delegation will be extended to include a byte range and delegation callback
will have to specify the part of the file that were changed by a writer or being
requested by  a reader.

It might be required that the byte range be on block boundary. The byte range
might start with full file delegations, but it might be reduced to the range that
represent the cached portion of the file. We will have to put some limit to the
number of byte ranges to keep resources usage under control.

The applications that can benefit from this feature are HPC application, multi-node
database programs, web server cluster, etc.


case 1:
Lets consider the case of multiple readers and one writer. In this multi-reader-one-writer
mode all clients will keep both the delegations and the cache and will start with a full
file byte range.
When a reader makes a read of some byte range from the server the server will make a
callback to the writer, if it is holding a delegation for the requested read byte range,
requesting that this range be flushed to the server, the writer will  flush the changed
byte range if the range is dirty before the server can respond to the reader.
The writer might also consider give up that byte range to avoid more call backs.

case 2:
Again all client in this multi-reader-one-writer mode will maintain their caches.
When the writer writes to the server all readers will be notified of the changed byte range.
At this point the reader can purge that range from its cache if it was there in the first place
or read that range from the server to keep its cache up to date. It can also give up the
delegation if it is not interest in caching this range.

case 3:
There are HPC application that split to file to segments that each node is working on.
It will be nice to give each client a write delegation for the portion of the file that is will
update, a good hint would be when the client punches a hole.

Marc.
_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4

Gmane