Jeff Layton | 1 Apr 2006 14:16

Re: Q: NFS server cluster

On Fri, 2006-03-31 at 16:02 +0200, Chris Osicki wrote:
> Is there anything I can do to eliminate this behaviour?
> If I understand correctly the problem is on the server side for the
> client nothing changes, the same IP-address to talk to, the same
> filesystem/filehandle. Or am I missing something.
> 

Be sure the blockdev has the same device major/minor number on both
hosts, though it sounds like you may already have checked that.

Otherwise, this sounds a lot like an issue with ARP. When you float IP
addresses you need to have the new address owner send gratuitous ARPs so
that its neighbors update their caches. Make sure your failover software
is doing this, and that the client and/or router isn't ignoring them.

-- Jeff

-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
NFS maillist  -  NFS <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

Jeff Layton | 1 Apr 2006 16:17

[PATCH] lockd: add procfs control to cue lockd to release all locks on a device

There's been a long standing problem with Linux NFS implementation. If a
client has active POSIX locks on a filesystem, it cannot be unmounted
even if you have unexported it and killed any userspace processes that
are actively working in it. This is especially a problem in clustered
NFS setups, as it can prevent a successful failover from occurring.

There is an existing workaround, which is to send a SIGKILL to lockd.
Unfortunately, that makes it drop all of its locks -- even ones on
filesystems that aren't failing over. This is bad in a cluster with
multiple NFS services that fail over independently, or on hosts with a
mix of clustered and non-clustered NFS shares.

This patch attempts to remedy this by adding a new procfs file
(/proc/fs/lockd/release_device). Echoing the dev_t value of the block
device with the underlying filesystem will tell lockd to drop all locks
on that device. I considered implementing this via sysfs or configfs,
but it wasn't clear to me how this would fall into the heirarchy of
either.

I've tested this and it works correctly. Comments and suggestions are
certainly welcome.

Thanks,
Jeff

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
 <at>  <at>  -34,11 +34,15  <at>  <at> 
 #include <linux/sunrpc/svcsock.h>
(Continue reading)

Neil Brown | 3 Apr 2006 03:17
X-Face
Picon
Gravatar

Re: [NFS] [PATCH] knfsd: Correct reserved reply space for read requests.

On Thursday March 30, eshel <at> almaden.ibm.com wrote:
> Hi Neil
> Can we use this opportunity to change NFSSVC_MAXBLKSIZE from 32K to 64K to 
> match RPCSVC_MAXPAYLOAD. It makes real difference in I/O performance (we 
> will still be saving half the space we used to allocate :).
> Thanks, Marc. 

Maybe... but why not 128K ??

There is certainly room to increase NFSSVC_MAXBLKSIZE, but I feel that
it needs to be done together with a more detailed look at consequences
in the network layer, particularly TCP window sizes.  I wouldn't mind
making a CONFIG_ tunable without that detailed look, but making it a
fixed change I feel less comfortable about.

NeilBrown
Marc Eshel | 3 Apr 2006 05:28
Picon
Favicon

Re: [PATCH] knfsd: Correct reserved reply space for read requests.

nfs-admin <at> lists.sourceforge.net wrote on 04/02/2006 06:17:17 PM:

> On Thursday March 30, eshel <at> almaden.ibm.com wrote:
> > Hi Neil
> > Can we use this opportunity to change NFSSVC_MAXBLKSIZE from 32K to 
64K to 
> > match RPCSVC_MAXPAYLOAD. It makes real difference in I/O performance 
(we 
> > will still be saving half the space we used to allocate :).
> > Thanks, Marc. 
> 
> Maybe... but why not 128K ??

Yes, It would be nice to be able to match the Linux client side that can 
go to 1MB. 

> There is certainly room to increase NFSSVC_MAXBLKSIZE, but I feel that
> it needs to be done together with a more detailed look at consequences
> in the network layer, particularly TCP window sizes.  I wouldn't mind
> making a CONFIG_ tunable without that detailed look, but making it a
> fixed change I feel less comfortable about.

Like you said it will need match more work to do it right and also not 
waste space for all RPC request to which the maximum possible size is 
allocated up front. But until than way not take advantage of this minor 
change that can give us significant performance improvement. I run with 
NFSSVC_MAXBLKSIZE set to 64K (the only change I made) and saw 10%-15% 
improvement for NFS reads. Is there anyone out there that is looking at 
making this improvement ?

(Continue reading)

Neil Brown | 3 Apr 2006 05:45
X-Face
Picon
Gravatar

Re: [NFS] [PATCH] knfsd: Correct reserved reply space for read requests.

On Sunday April 2, eshel <at> almaden.ibm.com wrote:
> nfs-admin <at> lists.sourceforge.net wrote on 04/02/2006 06:17:17 PM:
> 
> > On Thursday March 30, eshel <at> almaden.ibm.com wrote:
> > > Hi Neil
> > > Can we use this opportunity to change NFSSVC_MAXBLKSIZE from 32K to 
> 64K to 
> > > match RPCSVC_MAXPAYLOAD. It makes real difference in I/O performance 
> (we 
> > > will still be saving half the space we used to allocate :).
> > > Thanks, Marc. 
> > 
> > Maybe... but why not 128K ??
>  
> Yes, It would be nice to be able to match the Linux client side that can 
> go to 1MB. 
> 
> > There is certainly room to increase NFSSVC_MAXBLKSIZE, but I feel that
> > it needs to be done together with a more detailed look at consequences
> > in the network layer, particularly TCP window sizes.  I wouldn't mind
> > making a CONFIG_ tunable without that detailed look, but making it a
> > fixed change I feel less comfortable about.
> 
> Like you said it will need match more work to do it right and also not 
> waste space for all RPC request to which the maximum possible size is 
> allocated up front. But until than way not take advantage of this minor 
> change that can give us significant performance improvement. I run with 
> NFSSVC_MAXBLKSIZE set to 64K (the only change I made) and saw 10%-15% 
> improvement for NFS reads. Is there anyone out there that is looking at 
> making this improvement ?
(Continue reading)

NeilBrown | 3 Apr 2006 07:18
X-Face
Picon
Gravatar

[PATCH 001 of 16] knfsd: locks: flag NFSv4-owned locks


Use the fl_lmops field to identify which locks are ours, instead of trying
to look them up in our private hash.  This is safer and more efficient.

Earlier versions of this patch used a lock flag instead, but Trond pointed
out that adding a new flag for each lock manager wasn't going to scale
well, and suggested this approach instead; a separate patch converts lockd
to using fl_lmops in the same way.

In the NFSv4 case this looks like a bit of a hack, since the NFSv4 server
isn't currently actually defining a lock_manager_operations struct, so we
end up defining one *just* to serve as a cookie to identify our locks.

But it works, and we actually do expect to start using the
lock_manager_operations at some point anyway.

Signed-off-by: Marc Eshel <eshel <at> almaden.ibm.com>
Signed-off-by: Andy Adamson <andros <at> citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields <at> citi.umich.edu>
Signed-off-by: Neil Brown <neilb <at> suse.de>

### Diffstat output
 ./fs/nfsd/nfs4state.c |   38 ++++++++++++++++----------------------
 1 file changed, 16 insertions(+), 22 deletions(-)

diff ./fs/nfsd/nfs4state.c~current~ ./fs/nfsd/nfs4state.c
--- ./fs/nfsd/nfs4state.c~current~	2006-04-03 15:12:03.000000000 +1000
+++ ./fs/nfsd/nfs4state.c	2006-04-03 15:12:03.000000000 +1000
 <at>  <at>  -2495,36 +2495,27  <at>  <at>  nfs4_transform_lock_offset(struct file_l
 		lock->fl_end = OFFSET_MAX;
(Continue reading)

NeilBrown | 3 Apr 2006 07:18
X-Face
Picon
Gravatar

[PATCH 000 of 16] knfsd: Introduction

Groan, I missed the merge-window, but that's for new features, and this
is bug fixes...

The following 16 patches fix various bugs in the nfs server, mostly
related to NFSv4, but there are a couple that are more generally
applicable (005 in particular).

They are against 2.6.16-mm2 and should be suitable for 2.6.17-rc2.

Thanks,
NeilBrown

 [PATCH 001 of 16] knfsd: locks: flag NFSv4-owned locks
 [PATCH 002 of 16] knfsd: nfsd4: Wrong error handling in nfs4acl
 [PATCH 003 of 16] knfsd: nfsd4: better nfs4acl errors
 [PATCH 004 of 16] knfsd: nfsd4: fix acl xattr length return
 [PATCH 005 of 16] knfsd: nfsd: oops exporting nonexistent directory
 [PATCH 006 of 16] knfsd: nfsd: nfsd_setuser doesn't really need to modify rqstp->rq_cred.
 [PATCH 007 of 16] knfsd: nfsd4: remove nfsd_setuser from putrootfh
 [PATCH 008 of 16] knfsd: nfsd4: fix corruption of returned data when using 64k pages
 [PATCH 009 of 16] knfsd: nfsd4: fix corruption on readdir encoding with 64k pages
 [PATCH 010 of 16] knfsd: svcrpc: gss: don't call svc_take_page unnecessarily
 [PATCH 011 of 16] knfsd: svcrpc: WARN() instead of returning an error from svc_take_page
 [PATCH 012 of 16] knfsd: nfsd4: fix laundromat shutdown race
 [PATCH 013 of 16] knfsd: nfsd4: nfsd4_probe_callback cleanup
 [PATCH 014 of 16] knfsd: nfsd4: add missing rpciod_down()
 [PATCH 015 of 16] knfsd: nfsd4: limit number of delegations handed out.
 [PATCH 016 of 16] knfsd: nfsd4: grant delegations more frequently
NeilBrown | 3 Apr 2006 07:18
X-Face
Picon
Gravatar

[PATCH 003 of 16] knfsd: nfsd4: better nfs4acl errors


We're returning -1 in a few places in the NFSv4<->POSIX acl translation
code where we could return a reasonable error.

Also allows some minor simplification elsewhere.

Signed-off-by: J. Bruce Fields <bfields <at> citi.umich.edu>
Signed-off-by: Neil Brown <neilb <at> suse.de>

### Diffstat output
 ./fs/nfsd/nfs4acl.c |    6 +++---
 ./fs/nfsd/nfs4xdr.c |    7 +++----
 2 files changed, 6 insertions(+), 7 deletions(-)

diff ./fs/nfsd/nfs4acl.c~current~ ./fs/nfsd/nfs4acl.c
--- ./fs/nfsd/nfs4acl.c~current~	2006-04-03 15:12:05.000000000 +1000
+++ ./fs/nfsd/nfs4acl.c	2006-04-03 15:12:06.000000000 +1000
 <at>  <at>  -710,9 +710,9  <at>  <at>  calculate_posix_ace_count(struct nfs4_ac
 		/* Also, the remaining entries are for named users and
 		 * groups, and come in threes (mask, allow, deny): */
 		if (n4acl->naces < 7)
-			return -1;
+			return -EINVAL;
 		if ((n4acl->naces - 7) % 3)
-			return -1;
+			return -EINVAL;
 		return 4 + (n4acl->naces - 7)/3;
 	}
 }
 <at>  <at>  -866,7 +866,7  <at>  <at>  nfs4_acl_add_ace(struct nfs4_acl *acl, u
(Continue reading)

NeilBrown | 3 Apr 2006 07:18
X-Face
Picon
Gravatar

[PATCH 004 of 16] knfsd: nfsd4: fix acl xattr length return


We should be using the length from the second vfs_getxattr, in case it
changed.  (Note: there's still a small race here; we could end up returning
-ENOMEM if the length increased between the first and second call.  I don't
know whether it's worth spending a lot of effort to fix that.)

This makes XFS ACLs usable on NFS exports, which they currently aren't,
since XFS appears to be returning a too-large value for vfs_getxattr() when
it's passed a NULL buffer.  So there's probably an XFS bug here too, though
since getxattr with a NULL buffer is usually used to decide how much memory
to allocate, it may be a fairly harmless bug in most cases.

Signed-off-by: J. Bruce Fields <bfields <at> citi.umich.edu>
Signed-off-by: Neil Brown <neilb <at> suse.de>

### Diffstat output
 ./fs/nfsd/vfs.c |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff ./fs/nfsd/vfs.c~current~ ./fs/nfsd/vfs.c
--- ./fs/nfsd/vfs.c~current~	2006-04-03 15:12:07.000000000 +1000
+++ ./fs/nfsd/vfs.c	2006-04-03 15:12:07.000000000 +1000
 <at>  <at>  -371,7 +371,6  <at>  <at>  out_nfserr:
 static ssize_t nfsd_getxattr(struct dentry *dentry, char *key, void **buf)
 {
 	ssize_t buflen;
-	int error;

 	buflen = vfs_getxattr(dentry, key, NULL, 0);
 	if (buflen <= 0)
(Continue reading)

NeilBrown | 3 Apr 2006 07:18
X-Face
Picon
Gravatar

[PATCH 005 of 16] knfsd: nfsd: oops exporting nonexistent directory


Export a directory that does not exist:
	exportfs -orw,fsid=0,insecure,no_subtree_check client:/home/NFS4

Try to mount from client with nfs4. Mount hangs (I'm not sure why -
that's another issue).

While client is hung, back on server

	mkdir /home/NFS4

The server panics in dput. I traced the problem back to svc_export_parse()
calling path_release() even though path_lookup() failed (it happens
to fill in the nameidata structure with a negative dentry - so the test
after out: succeeds).

After patching, an recreating the problem, the client mount still takes
some time before finally exiting with a message "couldn't read
superblock".

Here is a simple patch to resolve this issue:

Signed-Off-By: Frank Filz <ffilzlnx <at> us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields <at> citi.umich.edu>
Signed-off-by: Neil Brown <neilb <at> suse.de>

### Diffstat output
 ./fs/nfsd/export.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

(Continue reading)


Gmane