Nicolas Williams | 1 Sep 2005 03:07
Picon

Re: pNFS, OPEN principals and ACLs

On Wed, Aug 31, 2005 at 12:08:45PM -0700, Mike Eisler wrote:
> 
> 
> --- Nicolas Williams <Nicolas.Williams <at> sun.com> wrote:
> 
> > On Tue, Aug 30, 2005 at 08:22:39PM -0700, Mike Eisler wrote:
> > > What of the issue Garth raised? I propose that we introduce
> > > two new operations, GETFD and PUTFD. Like GETFH, GETFD is invoked in the
> > > same COMPOUND as OPEN, after OPEN. GETFD returns a file descriptor.
> > > PUTFD injects a file descriptor. Unlike file handles, file descriptors
> > > do not persist. One a file is CLOSEd, the file descriptor used
> > > in the CLOSE is disposed of. PUTFD is used for any operation on
> > > a regular file that takes a stateid. PUTFD is used instead of PUTFH.
> > ...
> > 
> > Sounds close to an access token; why not deal with NFS as the backend
> 
> It is solving the same problem as the access token, but it isn't an access token.

But without a metadata server<->data server protocol the FD is not
enough.

> > protocol as with an OSD...  The client could get the FD from the
> > metadata server and then PUTFD the same thing on the data servers.  To
> > build a token out of this is easy, just structure it something like,
> > roughly:
> > 
> > { FD-data = {issuer, fd-index-local-to-issuer, expiration, FH, princ name,
> > 	     access-type},
> >   key-id, HMAC(K, <FD-data>) }
(Continue reading)

Marc Eshel | 1 Sep 2005 09:01
Picon
Favicon

Re: pNFS and equivalent data servers

Brent Welch <welch <at> panasas.com> wrote on 08/31/2005 03:39:28 PM:

> I'm worried we are conflating three different things:
> multiple paths to the same device,
> redundant or "equivalent" data servers,
> error recovery.
> 
I don't see them as different thing at all, they all have a related 
objective which is high availability. 

> In addition, we are trying to optimize performance of an error case,
> which should be viewed with suspicion. The important thing
> to get right in the failure case is that storage is consistent,
> and that the server implementation isn't horribly complex,
> and that the optimization for the error case doesn't slow
> down the non-error case.
> 
Suspicion that it is used for other than error recovery case? what else 
can you used it for? if you have some good ideas there please share:) I 
don't see how this slows down non-error case. In an setup where you have 
few hundred nodes and few thousands disks and you stripe mutli gigabyte 
file over all those parts, error conditions are routine and should not be 
regarded as isolated events.

> Finally, you are also making some assumptions about the back
> end that don't apply everywhere.  For example, the analog of
> a file handle in the object world is a capability, and they are
> by definition specific to a device.  You simply can't give them
> to the wrong device and have them work.  I can also imagine
> there could be files-based pNFS back-ends where the servers don't
(Continue reading)

William A.(Andy) Adamson | 1 Sep 2005 15:41
Picon
Favicon

Re: pNFS and equivalent data servers

brent, what do you think of this.

i want to export your panasas fs to another proprietary cluster fs across the 
WAN. so, i place pNFS meta/data servers on panasas client nodes, and export 
the panasas fs to the WAN. i do the same for, say, lustre, GPFS, or a block 
layout cluster. i can then use the generic file layout clients with full 
nfsv4.0 security to access both clusters and copy your panasas data from one 
cluster to another. in this scenerio, the pNFS servers exporting the panasas 
cluster are in the same situation as pNFS servers on a GPFS cluster, and have 
the same issues - each pNFS server can be a metadata server or data server, 
multiple paths, etc. this is a good use of pNFS, and should be considered in 
the protocol design.

-->Andy 

> I'm worried we are conflating three different things:
> multiple paths to the same device,
> redundant or "equivalent" data servers,
> error recovery.
> 
> In addition, we are trying to optimize performance of an error case,
> which should be viewed with suspicion. The important thing
> to get right in the failure case is that storage is consistent,
> and that the server implementation isn't horribly complex,
> and that the optimization for the error case doesn't slow
> down the non-error case.
> 
> Finally, you are also making some assumptions about the back
> end that don't apply everywhere.  For example, the analog of
> a file handle in the object world is a capability, and they are
(Continue reading)

Brent Welch | 2 Sep 2005 00:49
X-Face
Favicon

Re: pNFS and equivalent data servers

Sure, Andy, what you suggest is fine.  It doesn't relate directly
to the debate (as I understand it), which is about the semantics of
having multiple addresses for one "device".

I agree when Marc says that multi-path routing, redundant servers,
and robust error handling are all related.  But, I'm only trying
to point out that they are not the same thing;  instead, they are
different parts of an overall architecture.  I am worried that
folks want to burn into the pNFS spec that if a device has multiple
addresses, it might actually be a magic device that lives in two
physical locations and has some semantics about the data in those
two locations.  That is not a "device" -> that is a "layout".
That's exactly what a layout is, right? it is a distribution of data
across multiple devices, with various semantics about the relationship
between the distributed data.

As I said, I am completely in favor of allowing a single device to
have multiple addresses to support multi-path access to that device.
We have to agree that a device is a block storage device, or an
object storage device, or a file server.  In contrast, an NFS
head over top a cluster file system is not a "device".  It is something
much more clever with loads of semantics, and hiding those semantics
in GETDEVICEINFO is not the right thing to do.

Next, my comment about optimization of the error case is this:
Be careful that your optimizations don't actually make error recovery
more difficult, and ultimately lead to complex servers, data corruption
bugs, or both.  Error recovery is a property of the layout
(i.e., it is a property of the relationship among the distributed
data)  If you want to define a layout that has fast and loose
(Continue reading)

Marc Eshel | 2 Sep 2005 04:19
Picon
Favicon

Re: pNFS and equivalent data servers

Brent Welch <welch <at> panasas.com> wrote on 09/01/2005 03:49:44 PM:

> Sure, Andy, what you suggest is fine.  It doesn't relate directly
> to the debate (as I understand it), which is about the semantics of
> having multiple addresses for one "device".
> 
> I agree when Marc says that multi-path routing, redundant servers,
> and robust error handling are all related.  But, I'm only trying
> to point out that they are not the same thing;  instead, they are
> different parts of an overall architecture.  I am worried that
> folks want to burn into the pNFS spec that if a device has multiple
> addresses, it might actually be a magic device that lives in two
> physical locations and has some semantics about the data in those
> two locations.  That is not a "device" -> that is a "layout".
> That's exactly what a layout is, right? it is a distribution of data
> across multiple devices, with various semantics about the relationship
> between the distributed data.

The main motivation here is to get multi-path to the same device. When one 
of the node in the cluster fails (for example, the IO request times out) 
the client can use a different node in the cluster to get to the same 
devices.

> As I said, I am completely in favor of allowing a single device to
> have multiple addresses to support multi-path access to that device.
> We have to agree that a device is a block storage device, or an
> object storage device, or a file server.  In contrast, an NFS
> head over top a cluster file system is not a "device".  It is something
> much more clever with loads of semantics, and hiding those semantics
> in GETDEVICEINFO is not the right thing to do.
(Continue reading)

Marc Eshel | 2 Sep 2005 19:39
Picon
Favicon

pNFS some minor changes

Since a new draft is coming out soon I wanted to bring up few changes. I 
think that LAYOUTGET should have the same capability that GETDEVICELIST 
has to get a long list of devices in multiple calls using a cookie. 
Another request that I made before is to be able to recall the device list 
from the client (invalidate its device list cache) so the server doesn't 
have to maintain this for ever increasing device id. Garth said 
"reasonable request", Dave Noveck said "no", does anyone else has an 
opinion? This would be my first choice but if there is no consensus for it 
I would at least request that the device id change from 32bit to 64 bit. 

_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4

Garth Goodson | 2 Sep 2005 20:01
Picon

Re: pNFS some minor changes

Marc Eshel wrote:
> Since a new draft is coming out soon I wanted to bring up few changes. I 
> think that LAYOUTGET should have the same capability that GETDEVICELIST 
> has to get a long list of devices in multiple calls using a cookie.

The reason we haven't done this is since we are using a shorthand for 
the device ID (8 bytes -- I've increased it to 64 bit) and since the 
server can always hand back a layout that covers a smaller byte range 
than was asked for (in the case of a very long list).  The client can 
always make calls to get multiple layouts to cover the range it needs 
(if it really can't fit in the 32KB or whatever the transport allows). 
We could add a field that specifies the maximum byte count to ensure the 
appropriate size is returned.

> Another request that I made before is to be able to recall the device list 
> from the client (invalidate its device list cache) so the server doesn't 
> have to maintain this for ever increasing device id. Garth said 
> "reasonable request", Dave Noveck said "no", does anyone else has an 
> opinion? 

I did say that, but I'm still not convinced it is necessary.  I think 
the device ID space is large enough that reuse of IDs is not an issue. 
The client is able to maintain its own cache of IDs, since it can always 
reget them.  If mappings change, the server can recall the appropriate 
layouts and clients can demand the new IDs.  I understand that you want 
to avoid another level of ID/node mapping, but depending on the 
multi-pathing issue, you would have to do so anyway.

> This would be my first choice but if there is no consensus for it 
> I would at least request that the device id change from 32bit to 64 bit. 
(Continue reading)

Marc Eshel | 2 Sep 2005 20:02
Picon
Favicon

pNFS and NFSv3

The last draft say in 2.4 "Storage Protocol" that the data server can 
NFSv3. I assume that would be a new layout type, can we add it to the 
draft?
Marc. 

_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4

Garth Goodson | 2 Sep 2005 20:13
Picon

Re: pNFS and NFSv3

Marc Eshel wrote:
> The last draft say in 2.4 "Storage Protocol" that the data server can 
> NFSv3. I assume that would be a new layout type, can we add it to the 
> draft?
> Marc. 
> 

It is saying that v3 is a possible storage protocol for files (perhaps I 
should remove this reference, or clarify it).  We have avoided defining 
such a storage protocol since it doesn't fit within the v4 security 
framework.  It could be added later as an additional protocol (such as 
is being done for blocks and objects), however, the main spec is only 
going to specify v4 as a storage protocol.

-Garth

> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4 <at> ietf.org
> https://www1.ietf.org/mailman/listinfo/nfsv4

_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4

Marc Eshel | 2 Sep 2005 20:48
Picon
Favicon

Re: pNFS some minor changes

Garth Goodson <Garth.Goodson <at> netapp.com> wrote on 09/02/2005 11:01:28 AM:

> Marc Eshel wrote:
> > Since a new draft is coming out soon I wanted to bring up few changes. 
I 
> > think that LAYOUTGET should have the same capability that 
GETDEVICELIST 
> > has to get a long list of devices in multiple calls using a cookie.
> 
> The reason we haven't done this is since we are using a shorthand for 
> the device ID (8 bytes -- I've increased it to 64 bit) and since the 
> server can always hand back a layout that covers a smaller byte range 
> than was asked for (in the case of a very long list).  The client can 
> always make calls to get multiple layouts to cover the range it needs 
> (if it really can't fit in the 32KB or whatever the transport allows). 
> We could add a field that specifies the maximum byte count to ensure the 

> appropriate size is returned.

If you are describing the striping layout where you can fit only 9 out of 
your 10 devices in the first message that means that now you will need to 
get the rest of the layout for the file in a one to one mapping which can 
be a large number of messages for a large file, but if you could say that 
this is a continuation of the first messages than you can get the complete 
set in 2 messages. Having said that I would say that if we can send a 
message of size 32K which can describe about 4000 device that should be 
enough, but less would be a problem. 
Marc. 

_______________________________________________
(Continue reading)


Gmane