Jason Holmes | 1 Oct 17:40 2004
Picon

Re: NFS stops responding

FYI, I'm beginning to suspect that this is a problem more with the newer 
RedHat kernels than anything else.  I've only had one vanilla kernel NFS 
lockup since I moved the NFS servers to 2.6.8.1 (3 days) and that 
happened right after I did the move, so it could be coincidental.  Back 
when the servers ran RedHat kernels, the RedHat kernel clients never 
locked up whereas the vanilla clients did.  Yesterday I had 4 NFS 
lockups on the same machine running the RedHat 2.4.21-20.ELsmp kernel 
(the one that generated the trace below), but it hasn't locked up since 
I moved it to 2.6.8.1.  I guess I'll know for sure if my lockups don't 
come back for a week or so.

Thanks,

--
Jason Holmes

Jason Holmes wrote:
> Here's a 'sysrq-T' listing for a few hung processes.  Unfortunately, 
> this was on a 2.4.21-20.ELsmp RedHat kernel and not a vanilla kernel 
> (I'll send one of those along as soon as I can get one):
> 
> xauth         D 00000100e2d30370  1312  9600   9599 (NOTLB)
> 
> Call Trace: [<ffffffff80120d8a>]{io_schedule+42} 
> [<ffffffff801420ed>]{___wait_on_page+285}
>        [<ffffffff8014316a>]{do_generic_file_read+1258} 
> [<ffffffff80143770>]{file_read_actor+0}
>        [<ffffffff801438c5>]{generic_file_new_read+165} 
> [<ffffffffa02ec3a9>]{:nfs:nfs_file_read+217}
>        [<ffffffff8015dfd2>]{sys_read+178} 
(Continue reading)

John Roberts | 1 Oct 18:41 2004

Seeking NFS developer tips


Hi,

I've got the Linux kernel sources and built them.
I now want to try using my hacked version of NFS.

I'm just wondering how NFS developers switch to
using their own version of NFS?

Do you copy the NFS binaries over to the kernel
directories?  Or do you hack the /etc/rc.d/init.d
script to point to a locally built binary?

Also, do you ever have to reboot Linux to restart
NFS?  Or does NFS behavior never really depend
on a "fresh" boot of the OS?

Thanks!

John Roberts

john_roberts <at> credence.com

-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
NFS maillist  -  NFS <at> lists.sourceforge.net
(Continue reading)

Hadmut Danisch | 4 Oct 01:03 2004
Picon

NFS incompatible with Solaris jumpstart

Hi,

I want to jumpstart-install (initial installation of the 
Solaris operating system) a sparc machine from a linux server
with kernel 2.6.8.1.

It works when using the nfs-user-server, but the sparc hangs after 
printing the OS prompt when using the nfs-kernel-server.

Any idea in how the nfs-user-server and the nfs-kernel-server differ 
and how to fix it?

regards
Hadmut
Hadmut Danisch | 4 Oct 10:11 2004
Picon

Re: NFS incompatible with Solaris jumpstart

On Mon, Oct 04, 2004 at 01:10:46AM +0100, P. Benie wrote:
> Re your jumpstart problem - a colleague in another department had trouble
> with this once. I don't think he had the time to fix the problem.
> 
> He did get as far as finding that the client got an error from the NFS
> server when accessing /dev/console. This would give the impression of
> hanging as soon as the kernel starts init. It should be easy to verify if
> you have the same problem using tcpdump.

/dev/ was a good hint. I found the problem:

When booting the Solaris system wants to write to the 
Boot environment (nasty, isn't it?).

Since the userspace daemon doesn't understand the /etc/exports syntax
of the kernel version, I wrote a different, simplified /etc/exports
for the userspace daemon. 

The userspace daemon allowed write access, the kernel version didn't. 
That was the problem.

regards
Hadmut
Steve Dickson | 4 Oct 18:24 2004
Picon

[PATCH] lockd

Hey Neil,

Attached is a patch that fixes some potential SMP races
in the lockd code that were identified by the SLEEP_ON_BKLCHECK
that was (at one time) in the -mm tree...

Signed-Off-By: Steve Dickson <SteveD <at> RedHat.com>

Trond Myklebust | 4 Oct 19:50 2004
Picon
Picon

Re: [PATCH] lockd

På må , 04/10/2004 klokka 18:24, skreiv Steve Dickson:

> Hey Neil,

Hey! This is the client side NLM code... 8-)

>  	clear_thread_flag(TIF_SIGPENDING);
> -	interruptible_sleep_on_timeout(&lockd_exit, HZ);
> -	if (nlmsvc_pid) {
> +	set_current_state(TASK_UNINTERRUPTIBLE);

Nope. Those clearly are not the same.

Note that you probably also want to move the call to
set_current_state(TASK_INTERRUPTIBLE) inside the loop. In that case you
can also remove the call to set_current_state(TASK_RUNNING) ('cos
schedule_timeout() will do that for you).

Also, why aren't you using the more standard DECLARE_WAITQUEUE(__wait)?

Cheers,
  Trond

Trond Myklebust | 4 Oct 20:01 2004
Picon
Picon

Re: [PATCH] lockd

På må , 04/10/2004 klokka 18:24, skreiv Steve Dickson:
> Hey Neil,
> 
> Attached is a patch that fixes some potential SMP races
> in the lockd code that were identified by the SLEEP_ON_BKLCHECK
> that was (at one time) in the -mm tree...

Just for the record: the "SMP race condition" argument given here is
completely bogus. sleep_on_* is quite safe to use when the SMP races are
being handled using the BKL, as is the case here.

That said, I agree that the patch is of interest given the long term
goal of removing the BKL completely. Perhaps you could therefore also
amend your changelog entry text to reflect this motive?

Cheers,
  Trond

Arjan van de Ven | 4 Oct 20:13 2004
Picon

Re: [PATCH] lockd

On Mon, 2004-10-04 at 20:01, Trond Myklebust wrote:
> På må , 04/10/2004 klokka 18:24, skreiv Steve Dickson:
> > Hey Neil,
> > 
> > Attached is a patch that fixes some potential SMP races
> > in the lockd code that were identified by the SLEEP_ON_BKLCHECK
> > that was (at one time) in the -mm tree...
> 
> Just for the record: the "SMP race condition" argument given here is
> completely bogus. sleep_on_* is quite safe to use when the SMP races are
> being handled using the BKL, as is the case here.

actually this triggered because there was NO bkl... if you hold the bkl
the warning doesn't trigger.....

Trond Myklebust | 4 Oct 20:22 2004
Picon
Picon

Re: [PATCH] lockd

På må , 04/10/2004 klokka 20:13, skreiv Arjan van de Ven:

> actually this triggered because there was NO bkl... if you hold the bkl
> the warning doesn't trigger.....

Then the fix is downright wrong.

We *must* be holding the BKL upon entry to nlmclnt_lock(). All sorts of
other things depend on it.

Cheers,
  Trond

Steve Dickson | 4 Oct 21:25 2004
Picon

Re: [PATCH] lockd


Trond Myklebust wrote:

>På må , 04/10/2004 klokka 18:24, skreiv Steve Dickson:
>
>  
>
>>Hey Neil,
>>    
>>
>
>Hey! This is the client side NLM code... 8-)
>  
>
Sorry buddy.... I'm having one of those days!!!! :-\

>Note that you probably also want to move the call to
>set_current_state(TASK_INTERRUPTIBLE) inside the loop. In that case you
>can also remove the call to set_current_state(TASK_RUNNING) ('cos
>schedule_timeout() will do that for you).
>
>  
>
Ok...

>Also, why aren't you using the more standard DECLARE_WAITQUEUE(__wait)?
>  
>
I guess I didn't realize that would be a better way to do it... I'll 
look into to...
(Continue reading)


Gmane