Cedric Jeanneret | 1 Feb 2010 13:31
Favicon
Gravatar

NFS stall problem

Hello,

we're currently using openvz (2.6.18-128.2.1.el5.028stab064.4) on debian lenny nodes, with debian
lenny or etch VE (kernel was alienized from latest openvz rpm).

We have an nfs server on a hardware node (so NOT in a VE) which shares some cartographic datas (means: LOT of
small files, aka tiles).

For now, we're mounting NFS shares in VE in this way :
nfs-server:/path/to/share on /path/to/mount/point type nfs (ro,nfsvers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=nfs-server)

Share is done as follow :
/path/to/share     IP(ro,all_squash,anongid=65534,sync,no_subtree_check)

Problem :
sometimes (well, about once a day, or each two days), nfs share seems to be stalled. That means :
- unable to access datas from VE
- unable to umount share in VE

The only thing we can do is a restart of the VE.
As far as I can see, the others VE having an nfs share on the same HN don't have any problem (well, for a while,
than it stalls too).

We don't have any logs in syslog nor kern.log, either on HN or in VEs.

NFS-SRV is in a VLAN, and distributes files accross up to 3 VLANS. firewall rules are correct...

Any idea ?

I've read on wiki that "it's better to mount nfs share on HN, then use a mount -o bind". We can't afford that, as
(Continue reading)

Rafael Ruiz | 1 Feb 2010 20:53
Picon

Re: NFS stall problem

Hello, NFSv3 is know to have stability issues on linux, so by default it 
uses NFSv2 wich is more stable but doesn't support files large than 2GB. 
As I can see you are using NFSv3. I would recommend you to Try using 
NFSv4, the link 
http://wiki.linux-nfs.org/wiki/index.php/General_troubleshooting_recommendations 
would help on setting this up.

This would help too 
http://www.citi.umich.edu/projects/nfsv4/OLS2001/tsld004.htm

I hope this help.

Best regards,

> Hello,
>
> we're currently using openvz (2.6.18-128.2.1.el5.028stab064.4) on debian lenny nodes, with debian
lenny or etch VE (kernel was alienized from latest openvz rpm).
>
> We have an nfs server on a hardware node (so NOT in a VE) which shares some cartographic datas (means: LOT of
small files, aka tiles).
>
> For now, we're mounting NFS shares in VE in this way :
> nfs-server:/path/to/share on /path/to/mount/point type nfs (ro,nfsvers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=nfs-server)
>
> Share is done as follow :
> /path/to/share     IP(ro,all_squash,anongid=65534,sync,no_subtree_check)
>
> Problem :
> sometimes (well, about once a day, or each two days), nfs share seems to be stalled. That means :
(Continue reading)

Gordan Bobic | 1 Feb 2010 23:35
Gravatar

Re: Re: NFS stall problem

NFSv2?! You are joking, right? NFSv3 has been stable for over a decade. 
Last time I saw NFSv2 in the wild was in the '90s.

If you said NFSv4 was unstable, that I might have been inclined to agree 
with.

Gordan

Rafael Ruiz wrote:
> Hello, NFSv3 is know to have stability issues on linux, so by default it 
> uses NFSv2 wich is more stable but doesn't support files large than 2GB. 
> As I can see you are using NFSv3. I would recommend you to Try using 
> NFSv4, the link 
> http://wiki.linux-nfs.org/wiki/index.php/General_troubleshooting_recommendations 
> would help on setting this up.
> 
> This would help too 
> http://www.citi.umich.edu/projects/nfsv4/OLS2001/tsld004.htm
> 
> I hope this help.
> 
> Best regards,
> 
>> Hello,
>>
>> we're currently using openvz (2.6.18-128.2.1.el5.028stab064.4) on 
>> debian lenny nodes, with debian lenny or etch VE (kernel was alienized 
>> from latest openvz rpm).
>>
>> We have an nfs server on a hardware node (so NOT in a VE) which shares 
(Continue reading)

Cédric Jeanneret | 2 Feb 2010 07:35
Favicon
Gravatar

Re: Re: NFS stall problem

Hello,

Hm, in fact lenny includes nfsv4. But we had same troubles with nfsv3
(they were even worth). I'll check the links and keep this thread
updated.

Thank you for your concern.

Best regards,

C.

On Mon, Feb 1, 2010 at 11:35 PM, Gordan Bobic <gordan@...> wrote:
> NFSv2?! You are joking, right? NFSv3 has been stable for over a decade. Last
> time I saw NFSv2 in the wild was in the '90s.
>
> If you said NFSv4 was unstable, that I might have been inclined to agree
> with.
>
> Gordan
>
> Rafael Ruiz wrote:
>>
>> Hello, NFSv3 is know to have stability issues on linux, so by default it
>> uses NFSv2 wich is more stable but doesn't support files large than 2GB. As
>> I can see you are using NFSv3. I would recommend you to Try using NFSv4, the
>> link
>> http://wiki.linux-nfs.org/wiki/index.php/General_troubleshooting_recommendations
>> would help on setting this up.
>>
(Continue reading)

Cliff Wells | 2 Feb 2010 22:28
Gravatar

BUG: scheduling while atomic

I'm running proxmox 1.4:

Linux proxmox1 2.6.24-9-pve #1 SMP PREEMPT Tue Nov 17 09:34:41 CET 2009 x86_64 GNU/Linux

I occasionally get these in syslog, sometimes they come in so fast the
machine becomes unresponsive and must be power cycled:

Jan 30 13:35:01 proxmox1 /USR/SBIN/CRON[31143]: (root) CMD (/usr/share/vzctl/scripts/vpsreboot)
Jan 30 13:35:01 proxmox1 /USR/SBIN/CRON[31146]: (root) CMD (/usr/share/vzctl/scripts/vpsnetclean)
Jan 30 13:35:03 proxmox1 kernel: BUG: scheduling while atomic: nginx/7242/0x00000003
Jan 30 13:35:03 proxmox1 kernel: Pid: 7242, comm: nginx Tainted:  G   M    2.6.24-9-pve #1
Jan 30 13:35:03 proxmox1 kernel:
Jan 30 13:35:03 proxmox1 kernel: Call Trace:
Jan 30 13:35:03 proxmox1 kernel: [<ffffffff804c7485>] thread_return+0x103/0x67e
Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8024d8f4>] lock_timer_base+0x34/0x70
Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8024debd>] __mod_timer+0xbd/0xe0
Jan 30 13:35:03 proxmox1 kernel: [<ffffffff804c7d38>] schedule_timeout+0x58/0xd0
Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8024d5c0>] process_timeout+0x0/0x10
Jan 30 13:35:03 proxmox1 kernel: [<ffffffff804c7d33>] schedule_timeout+0x53/0xd0
Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8030876a>] sys_epoll_wait+0x49a/0x560
Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8044c0a4>] compat_sys_getsockopt+0x74/0x1d0
Jan 30 13:35:03 proxmox1 kernel: [<ffffffff802370f0>] default_wake_function+0x0/0x10
Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8022aae2>] ia32_sysret+0x0/0xa
Jan 30 13:35:03 proxmox1 kernel:
Jan 30 13:35:03 proxmox1 kernel: BUG: scheduling while atomic: swapper/0/0x00000005
Jan 30 13:35:03 proxmox1 kernel: Pid: 0, comm: swapper Tainted:  G   M    2.6.24-9-pve #1
Jan 30 13:35:03 proxmox1 kernel:
Jan 30 13:35:03 proxmox1 kernel: Call Trace:
Jan 30 13:35:03 proxmox1 kernel: [<ffffffff804c7485>] thread_return+0x103/0x67e
Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8025fd44>] hrtimer_start+0xd4/0x190
(Continue reading)

John Drescher | 2 Feb 2010 22:36
Picon

Re: BUG: scheduling while atomic

On Tue, Feb 2, 2010 at 4:28 PM, Cliff Wells <cliff@...> wrote:
> I'm running proxmox 1.4:
>
> Linux proxmox1 2.6.24-9-pve #1 SMP PREEMPT Tue Nov 17 09:34:41 CET 2009 x86_64 GNU/Linux
>
>
> I occasionally get these in syslog, sometimes they come in so fast the
> machine becomes unresponsive and must be power cycled:
>
>
> Jan 30 13:35:01 proxmox1 /USR/SBIN/CRON[31143]: (root) CMD (/usr/share/vzctl/scripts/vpsreboot)
> Jan 30 13:35:01 proxmox1 /USR/SBIN/CRON[31146]: (root) CMD (/usr/share/vzctl/scripts/vpsnetclean)
> Jan 30 13:35:03 proxmox1 kernel: BUG: scheduling while atomic: nginx/7242/0x00000003
> Jan 30 13:35:03 proxmox1 kernel: Pid: 7242, comm: nginx Tainted:  G   M    2.6.24-9-pve #1
> Jan 30 13:35:03 proxmox1 kernel:
> Jan 30 13:35:03 proxmox1 kernel: Call Trace:
> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff804c7485>] thread_return+0x103/0x67e
> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8024d8f4>] lock_timer_base+0x34/0x70
> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8024debd>] __mod_timer+0xbd/0xe0
> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff804c7d38>] schedule_timeout+0x58/0xd0
> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8024d5c0>] process_timeout+0x0/0x10
> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff804c7d33>] schedule_timeout+0x53/0xd0
> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8030876a>] sys_epoll_wait+0x49a/0x560
> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8044c0a4>] compat_sys_getsockopt+0x74/0x1d0
> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff802370f0>] default_wake_function+0x0/0x10
> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8022aae2>] ia32_sysret+0x0/0xa
> Jan 30 13:35:03 proxmox1 kernel:
> Jan 30 13:35:03 proxmox1 kernel: BUG: scheduling while atomic: swapper/0/0x00000005
> Jan 30 13:35:03 proxmox1 kernel: Pid: 0, comm: swapper Tainted:  G   M    2.6.24-9-pve #1
> Jan 30 13:35:03 proxmox1 kernel:
(Continue reading)

John Drescher | 2 Feb 2010 22:38
Picon

Re: BUG: scheduling while atomic

On Tue, Feb 2, 2010 at 4:36 PM, John Drescher <drescherjm@...> wrote:
> On Tue, Feb 2, 2010 at 4:28 PM, Cliff Wells <cliff@...> wrote:
>> I'm running proxmox 1.4:
>>
>> Linux proxmox1 2.6.24-9-pve #1 SMP PREEMPT Tue Nov 17 09:34:41 CET 2009 x86_64 GNU/Linux
>>
>>
>> I occasionally get these in syslog, sometimes they come in so fast the
>> machine becomes unresponsive and must be power cycled:
>>
>>
>> Jan 30 13:35:01 proxmox1 /USR/SBIN/CRON[31143]: (root) CMD (/usr/share/vzctl/scripts/vpsreboot)
>> Jan 30 13:35:01 proxmox1 /USR/SBIN/CRON[31146]: (root) CMD (/usr/share/vzctl/scripts/vpsnetclean)
>> Jan 30 13:35:03 proxmox1 kernel: BUG: scheduling while atomic: nginx/7242/0x00000003
>> Jan 30 13:35:03 proxmox1 kernel: Pid: 7242, comm: nginx Tainted:  G   M    2.6.24-9-pve #1
>> Jan 30 13:35:03 proxmox1 kernel:
>> Jan 30 13:35:03 proxmox1 kernel: Call Trace:
>> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff804c7485>] thread_return+0x103/0x67e
>> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8024d8f4>] lock_timer_base+0x34/0x70
>> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8024debd>] __mod_timer+0xbd/0xe0
>> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff804c7d38>] schedule_timeout+0x58/0xd0
>> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8024d5c0>] process_timeout+0x0/0x10
>> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff804c7d33>] schedule_timeout+0x53/0xd0
>> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8030876a>] sys_epoll_wait+0x49a/0x560
>> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8044c0a4>] compat_sys_getsockopt+0x74/0x1d0
>> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff802370f0>] default_wake_function+0x0/0x10
>> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8022aae2>] ia32_sysret+0x0/0xa
>> Jan 30 13:35:03 proxmox1 kernel:
>> Jan 30 13:35:03 proxmox1 kernel: BUG: scheduling while atomic: swapper/0/0x00000005
>> Jan 30 13:35:03 proxmox1 kernel: Pid: 0, comm: swapper Tainted:  G   M    2.6.24-9-pve #1
(Continue reading)

Cliff Wells | 2 Feb 2010 23:53
Gravatar

Re: BUG: scheduling while atomic

Thanks much.  That was my suspicion.

Cliff

On Tue, 2010-02-02 at 16:38 -0500, John Drescher wrote:
> On Tue, Feb 2, 2010 at 4:36 PM, John Drescher <drescherjm@...> wrote:
> > On Tue, Feb 2, 2010 at 4:28 PM, Cliff Wells <cliff@...> wrote:
> >> I'm running proxmox 1.4:
> >>
> >> Linux proxmox1 2.6.24-9-pve #1 SMP PREEMPT Tue Nov 17 09:34:41 CET 2009 x86_64 GNU/Linux
> >>
> >>
> >> I occasionally get these in syslog, sometimes they come in so fast the
> >> machine becomes unresponsive and must be power cycled:
> >>
> >>
> >> Jan 30 13:35:01 proxmox1 /USR/SBIN/CRON[31143]: (root) CMD (/usr/share/vzctl/scripts/vpsreboot)
> >> Jan 30 13:35:01 proxmox1 /USR/SBIN/CRON[31146]: (root) CMD (/usr/share/vzctl/scripts/vpsnetclean)
> >> Jan 30 13:35:03 proxmox1 kernel: BUG: scheduling while atomic: nginx/7242/0x00000003
> >> Jan 30 13:35:03 proxmox1 kernel: Pid: 7242, comm: nginx Tainted:  G   M    2.6.24-9-pve #1
> >> Jan 30 13:35:03 proxmox1 kernel:
> >> Jan 30 13:35:03 proxmox1 kernel: Call Trace:
> >> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff804c7485>] thread_return+0x103/0x67e
> >> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8024d8f4>] lock_timer_base+0x34/0x70
> >> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8024debd>] __mod_timer+0xbd/0xe0
> >> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff804c7d38>] schedule_timeout+0x58/0xd0
> >> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8024d5c0>] process_timeout+0x0/0x10
> >> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff804c7d33>] schedule_timeout+0x53/0xd0
> >> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8030876a>] sys_epoll_wait+0x49a/0x560
> >> Jan 30 13:35:03 proxmox1 kernel: [<ffffffff8044c0a4>] compat_sys_getsockopt+0x74/0x1d0
(Continue reading)

Dietmar Maurer | 3 Feb 2010 08:24

RE: BUG: scheduling while atomic

> -----Original Message-----
> From: users-bounces@...
[mailto:users-bounces@...] On
> Behalf Of Cliff Wells
> Sent: Dienstag, 02. Februar 2010 22:28
> To: users@...
> Subject: [Users] BUG: scheduling while atomic
> 
> I'm running proxmox 1.4:
> 
> Linux proxmox1 2.6.24-9-pve #1 SMP PREEMPT Tue Nov 17 09:34:41 CET 2009
> x86_64 GNU/Linux

I suggest to update to proxmox 1.5 and try the 2.6.18 kernel branch.

- Dietmar
Cliff Wells | 3 Feb 2010 18:47
Gravatar

RE: BUG: scheduling while atomic

On Wed, 2010-02-03 at 08:24 +0100, Dietmar Maurer wrote:
> > -----Original Message-----
> > From: users-bounces@...
[mailto:users-bounces@...] On
> > Behalf Of Cliff Wells
> > Sent: Dienstag, 02. Februar 2010 22:28
> > To: users@...
> > Subject: [Users] BUG: scheduling while atomic
> > 
> > I'm running proxmox 1.4:
> > 
> > Linux proxmox1 2.6.24-9-pve #1 SMP PREEMPT Tue Nov 17 09:34:41 CET 2009
> > x86_64 GNU/Linux
> 
> I suggest to update to proxmox 1.5 and try the 2.6.18 kernel branch.

I have done so, thanks.   I'd still be interested in trying a proxmox
kernel minus preemption.

Cliff

Gmane