Garrick Staples | 1 May 2004 01:43
Picon
Favicon

Re: mountd segfault on itanium2

On Fri, Apr 30, 2004 at 02:24:14PM -0700, Garrick Staples alleged:
> Hi all,
>    I'm having a terrible time with mountd segfaulting on two Itanium boxes.  I
> can't find a specific trigger, but I can generally trigger it within a few
> minutes by just calling mount/umount a few hundred times.
> 
> I'm using glibc 2.3.2 and nfs-utils 1.0.6 from RHE.
> 
> In the tests below, I have a single directory exported to 10.125.0.0/16.  Since
> I know name resolution was a recent problem, I've made sure all clients are in
> /etc/hosts.  I'm using NIS, but files is before dns and nis in nsswitch.conf.
> I've also tested with and without nscd running.

> select(1024, [3 4 5 6 7], NULL, NULL, NULL) = 2 (in [5 6])
> read(5, "", 0)                          = 0
> --- SIGSEGV (Segmentation fault)  <at>  20000008002c19d0 (63742f3132353111) ---

> write(5, "10.125.0.0/16 0 \\x00080011020000"..., 62) = 62
> --- SIGSEGV (Segmentation fault)  <at>  20000000002899d0 (7064752f35343639) ---

I just spotted a pattern.  After collecting several strace samples, it always
segfaults after read() or write() to fd 5.  And fd 5 is always:

   open("/proc/net/rpc/nfsd.fh/channel", O_RDWR) = 5

I have no idea what the file is for, but grep'ing my straces shows that mountd
doesn't normally use it.  It can handle hundreds of mount/umount requests
without ever touching fd 5.  Then at some point it reads once:

   read(5, "10.125.0.0/16 0 \\x00080011020000"..., 128) = 35
(Continue reading)

J. Bruce Fields | 1 May 2004 02:15

Re: mountd segfault on itanium2

On Fri, Apr 30, 2004 at 04:43:27PM -0700, Garrick Staples wrote:
> I just spotted a pattern.  After collecting several strace samples, it always
> segfaults after read() or write() to fd 5.  And fd 5 is always:
> 
>    open("/proc/net/rpc/nfsd.fh/channel", O_RDWR) = 5

Any interesting messages from the kernel (in /var/log/messages)?

--Bruce Fields

-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
NFS maillist  -  NFS <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

Garrick Staples | 1 May 2004 02:24
Picon
Favicon

Re: mountd segfault on itanium2

On Fri, Apr 30, 2004 at 08:15:35PM -0400, J. Bruce Fields alleged:
> On Fri, Apr 30, 2004 at 04:43:27PM -0700, Garrick Staples wrote:
> > I just spotted a pattern.  After collecting several strace samples, it always
> > segfaults after read() or write() to fd 5.  And fd 5 is always:
> > 
> >    open("/proc/net/rpc/nfsd.fh/channel", O_RDWR) = 5
> 
> Any interesting messages from the kernel (in /var/log/messages)?

Nope, nothing in dmesg either.

--

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
Garrick Staples | 1 May 2004 05:07
Picon
Favicon

Re: mountd segfault on itanium2

On Fri, Apr 30, 2004 at 04:43:27PM -0700, Garrick Staples alleged:
> I just spotted a pattern.  After collecting several strace samples, it always
> segfaults after read() or write() to fd 5.  And fd 5 is always:
> 
>    open("/proc/net/rpc/nfsd.fh/channel", O_RDWR) = 5

I have an ugly work-around that seems to be working.  It seems that 2.6 has a
new nfs interface for userspace.  By forcing mountd to use the older 2.4
interface, it doesn't segfault anymore.  So something in the new code paths is
broken.

In support/nfs/cachio.c:
int
check_new_cache(void)
{
        struct stat stb;

        return 0;  /* DISABLE NEW 2.6 INTERFACE */

        return  (stat("/proc/fs/nfs/filehandle", &stb) == 0) ||
                (stat("/proc/fs/nfsd/filehandle", &stb) == 0);
}

Am I losing any functionality by doing this?  I can't actually find any
problems.

--

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
(Continue reading)

Steve Dickson | 1 May 2004 18:13
Picon
Favicon

Re: [PATCH] Reinstantiating stale inodes

Steve Dickson wrote:

> Here is a 2.4 patch that will reinstantiate an inode
> when a ESTALE error is returned on a getattr. When
> the error occurs, a lookup is immediately issued
> to get a new fh.
>
> The fixes the problem of a server rsync -a directory
> that a client has mounted. The key being the -a flag
> since it causes the server not to update the mtime on
> the directory.

It turns out there is a much easier and simpler (safer)
way to reinstantiate inodes than my original patch.

Realizing only parts of an inode needed to be reinstantiated
not the entire thing, this second patch only reinstantiates
two parts of the inode (i.e. fhandle and fattrs) instead of the
entire inode like my original patch did.

I had to export the new nfs_reinstantiate() function because
the ACL code needs to use it, so things like ls -l will work....

Comments? Is this something Marcelo might be interested in?

SteveD.

--- linux-2.4.21/fs/nfs/inode.c.org	2004-04-17 18:26:32.000000000 -0400
(Continue reading)

Trond Myklebust | 1 May 2004 21:25
Picon
Picon

Re: [PATCH] Reinstantiating stale inodes

On Sat, 2004-05-01 at 12:13, Steve Dickson wrote:

> Is this something Marcelo might be interested in?

Vetoed!

You are not listening to what I am saying: you are NOT allowed to change
inodes in this manner. The filehandle defines which file you are
referencing. By changing the filehandle, you are changing files from
beneath other processes and your own process.

Imagine if I do

rm blah; ln /etc/passwd blah

Your patch means that anyone that was writing to file "blah" before you
deleted it, will suddenly find themselves overwriting /etc/passwd. That
is NOT POSIX-compatible behaviour!

The ONLY way to overcome a stale inode is to d_drop() the dentry, unhash
the inode, and then force the VFS to look up a new inode. That way only
new calls to open() end up overwriting /etc/passwd in the above case.

Cheers,
  Trond

-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
(Continue reading)

Steve Dickson | 2 May 2004 01:57
Picon
Favicon

Re: [PATCH] Reinstantiating stale inodes

Trond Myklebust wrote:

>You are not listening to what I am saying: you are NOT allowed to change
>inodes in this manner. The filehandle defines which file you are
>referencing. By changing the filehandle, you are changing files from
>beneath other processes and your own process.
>  
>
Believe me... I hear you.... and do understand what I'm trying to do....
And I also apologize for being so persistent.... but just bothers the
hell out of me that other clients can recover from ESTALEs and we don't...

>Imagine if I do
>
>rm blah; ln /etc/passwd blah
>
>Your patch means that anyone that was writing to file "blah" before you
>deleted it, will suddenly find themselves overwriting /etc/passwd. That
>is NOT POSIX-compatible behaviour!
>  
>
I see this point in theory, but in my testing, bad things don't seem to 
happen.
The writes always fails with ESTALE... but I can't deny there is not a 
window
just because I can't reproduce it...

>The ONLY way to overcome a stale inode is to d_drop() the dentry, unhash
>the inode, and then force the VFS to look up a new inode. That way only
>new calls to open() end up overwriting /etc/passwd in the above case.
(Continue reading)

Trond Myklebust | 2 May 2004 02:22
Picon
Picon

Re: [PATCH] Reinstantiating stale inodes

On Sat, 2004-05-01 at 19:57, Steve Dickson wrote:

> Believe me... I hear you.... and do understand what I'm trying to do....
> And I also apologize for being so persistent.... but just bothers the
> hell out of me that other clients can recover from ESTALEs and we don't...

So please tell me why the following patch (which addresses the
particular problem that you raised of someone resetting mtime on the
parent directory) does not suffice?

Cheers,
  Trond
Attachment (gnurr.dif): text/x-patch, 3116 bytes
Steve Dickson | 2 May 2004 05:19
Picon
Favicon

Re: [PATCH] Reinstantiating stale inodes

Trond Myklebust wrote:

>So please tell me why the following patch (which addresses the
>particular problem that you raised of someone resetting mtime on the
>parent directory) does not suffice?
>  
>
Again this is were I started... and this patch does take care of the ESTALEs
but it also increases normal traffic by 2% to 3% (mostly getattrs and 
lookups)
when I ran the connectathon04 tests... Granted 2% to 3% is not that much
of an increase and my testing is not that exact... but....  any increase 
for an
error that generally does not happen, I didn't think would be acceptable...

But if this the patch thats going to make it into the 2.4 tree... so be 
it....
since it does avoid the ESTALE issues and maybe things will be a bit
more coherent since we do send a few more getattrs and lookups...

Thanks for your guidance.... it is definitely appreciated!

SteveD.

-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
(Continue reading)

Trond Myklebust | 2 May 2004 05:28
Picon
Picon

Re: [PATCH] Reinstantiating stale inodes

On Sat, 2004-05-01 at 23:19, Steve Dickson wrote:
> Again this is were I started... and this patch does take care of the ESTALEs
> but it also increases normal traffic by 2% to 3% (mostly getattrs and 
> lookups)

So this is what I don't understand.

Which are the operations that are supposed to "ctime" without changing
"mtime"? AFAICS the *only* such operation is utime(), which was the one
that was causing you trouble in the first place.

I certainly would not expect any difference between the two when looking
at the standard connectathon suite.

Unless....

Is your kernel perhaps missing the appended patch, which is already in
2.4.26?

Cheers,
  Trond

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/02/09 11:09:22-06:00 shaggy <at> kleikamp.dyn.webahead.ibm.com 
#   JFS: rename should update mtime on source and target directories
# 
(Continue reading)


Gmane