1 Oct 2004 17:40
Re: NFS stops responding
Jason Holmes <jholmes <at> psu.edu>
2004-10-01 15:40:38 GMT
2004-10-01 15:40:38 GMT
FYI, I'm beginning to suspect that this is a problem more with the newer
RedHat kernels than anything else. I've only had one vanilla kernel NFS
lockup since I moved the NFS servers to 2.6.8.1 (3 days) and that
happened right after I did the move, so it could be coincidental. Back
when the servers ran RedHat kernels, the RedHat kernel clients never
locked up whereas the vanilla clients did. Yesterday I had 4 NFS
lockups on the same machine running the RedHat 2.4.21-20.ELsmp kernel
(the one that generated the trace below), but it hasn't locked up since
I moved it to 2.6.8.1. I guess I'll know for sure if my lockups don't
come back for a week or so.
Thanks,
--
Jason Holmes
Jason Holmes wrote:
> Here's a 'sysrq-T' listing for a few hung processes. Unfortunately,
> this was on a 2.4.21-20.ELsmp RedHat kernel and not a vanilla kernel
> (I'll send one of those along as soon as I can get one):
>
> xauth D 00000100e2d30370 1312 9600 9599 (NOTLB)
>
> Call Trace: [<ffffffff80120d8a>]{io_schedule+42}
> [<ffffffff801420ed>]{___wait_on_page+285}
> [<ffffffff8014316a>]{do_generic_file_read+1258}
> [<ffffffff80143770>]{file_read_actor+0}
> [<ffffffff801438c5>]{generic_file_new_read+165}
> [<ffffffffa02ec3a9>]{:nfs:nfs_file_read+217}
> [<ffffffff8015dfd2>]{sys_read+178}
(Continue reading)
> clear_thread_flag(TIF_SIGPENDING);
> - interruptible_sleep_on_timeout(&lockd_exit, HZ);
> - if (nlmsvc_pid) {
> + set_current_state(TASK_UNINTERRUPTIBLE);
Nope. Those clearly are not the same.
Note that you probably also want to move the call to
set_current_state(TASK_INTERRUPTIBLE) inside the loop. In that case you
can also remove the call to set_current_state(TASK_RUNNING) ('cos
schedule_timeout() will do that for you).
Also, why aren't you using the more standard DECLARE_WAITQUEUE(__wait)?
Cheers,
Trond
RSS Feed