Tom Haynes | 2 Oct 2009 23:45

mount retries

Does the Linux mount client have any algorithms to retry a NFS mount?

Or does it depend entirely on the underlying protocol.

I.e., would TCP retry and UDP just give up?
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chuck Lever | 3 Oct 2009 00:59
Picon
Favicon

Re: mount retries

On Oct 2, 2009, at 5:45 PM, Tom Haynes wrote:
> Does the Linux mount client have any algorithms to retry a NFS mount?
>
> Or does it depend entirely on the underlying protocol.
>
> I.e., would TCP retry and UDP just give up?

The behavior is pretty complex.

There's a large retry loop that can try mounts in the foreground, or  
automatically daemonize if the mount request isn't successful at  
first.  The retry loop continues as long as there are server or  
network problems that prevent a definitive answer from the server from  
getting back to the client.  There is a retry= mount option to cut off  
retrying after a certain amount of time.

If a particular NFS version or transport is requested by the user, it  
will attempt to use those settings, and fail if those aren't available  
on the server.  For v2/v3, any parameters not specified by the user,  
like port, transport, or version, are filled in with pmap queries.   
The choice of transport is a little odd; if the user didn't specify a  
transport, it looks like the transport that worked for the pmap query  
is chosen regardless of what is registered.  ie, if a TCP pmap query  
fails, but the UDP pmap query worked, we go with UDP, whether or not  
the NFS service is registered on UDP.

For v4, it skips the pmap query and just dives into the kernel to try  
connecting with the server.

If none of these work, and the mount command hasn't gotten a definite  
(Continue reading)

Tom Haynes | 3 Oct 2009 04:35

Re: mount retries

Chuck Lever wrote:
>
> The exact retry behavior depends on whether user space or the kernel 
> is trying to do the talking.  NFSv4 and text-based NFSv2/v3 mounts do 
> most of the talking from the kernel.  Text-based mounts do a user 
> space pmap query or two, but the MNT request comes from the kernel.  
> Also, UDP retries a few times, but usually gives up after 30 seconds 
> or so, but TCP can retry the transport connect for over 3 minutes, 
> even before it gets to send any requests at all.
>
> I'm pretty sure I didn't answer your question.

Thanks Chuck, you actually nailed what I didn't ask properly.

The UDP retries being at 30s correlates with what I was seeing.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel J Blueman | 3 Oct 2009 17:59
Picon

Re: [2.6.31] NFS4ERR_GRACE unhandled...

Hi Trond,

On Mon, Sep 28, 2009 at 7:16 PM, Trond Myklebust
<Trond.Myklebust@...> wrote:
> On Sat, 2009-09-26 at 19:14 +0100, Daniel J Blueman wrote:
>> Hi Trond,
>>
>> After rebooting my 2.6.31 NFS4 server, I see a list of NFS kernel
>> errors [1] on the 2.6.31 client corresponding to NFS4ERR_GRACE, so
>> lock or file state recovery failed. Is this expected noting that I
>> have an internal firewall allowing incoming TCP port 2049 on the
>> server, and no firewall on the client, however I can't see how it can
>> thus be callback related?
>
> No. It looks as if your server rebooted while the client was recovering
> an expired lease.
>
> The following patch should prevent future occurrences of this bug...
>
> Cheers
>  Trond
> ------------------------------------------------------------------
> NFSv4: Handle NFS4ERR_GRACE when recovering an expired lease.
>
> From: Trond Myklebust <Trond.Myklebust@...>
>
> If our lease expires, and the server subsequently reboot, we need to be
> able to handle the case where the server refuses to let us recover state,
> because it is in the grace period.

(Continue reading)

Trond Myklebust | 4 Oct 2009 21:06
Picon
Picon

Re: [Bug 14276] nfsroot will not remount rw and claims illegal options

On Sun, 2009-10-04 at 17:48 +0000, bugzilla-daemon@...
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=14276
> 
> 
> 
> 
> 
> --- Comment #6 from Hans de Bruin <bruinjm@...>  2009-10-04
17:48:57 ---
> hans <at> orion:/var/diskless/kernel/linux-2.6$ git bisect bad
> 53a0b9c4c99ab0085a06421f71592722e5b3fd5f is first bad commit
> commit 53a0b9c4c99ab0085a06421f71592722e5b3fd5f
> Author: Chuck Lever <chuck.lever@...>
> Date:   Sun Aug 9 15:09:36 2009 -0400
> 
>     NFS: Replace nfs_parse_ip_address() with rpc_pton()
> 
>     Clean up: Use the common routine now provided in sunrpc.ko for parsing
> mount
>     addresses.
> 
>     Signed-off-by: Chuck Lever <chuck.lever@...>
>     Signed-off-by: Trond Myklebust <Trond.Myklebust@...>
> 
> :040000 040000 31c3998309b2c325b7ea927498e1a7e78e0e7747
> 513edbc936111eda9d1646db17e7240c77a32c42 M      fs
> hans <at> orion:/var/diskless/kernel/linux-2.6$

OK. Switching over to email.
(Continue reading)

Trond Myklebust | 4 Oct 2009 23:59
Picon
Picon

Re: [Bug 14276] nfsroot will not remount rw and claims illegal options

On Sun, 2009-10-04 at 15:06 -0400, Trond Myklebust wrote:
> Chuck, can you see how the rpc_pton() might be breaking nfsroot?
> The cmdline is of the form
> 
>     root=/dev/nfs nfsroot=10.10.0.2:/nfs/gemini,v3,tcp ro ip=::::::dhcp

I think I see it...

The difference is that rpc_pton4() starts with

	memset(sap, 0, sizeof(struct sockaddr_in));

That clears the port number that was set in nfs_remount(), and so the
comparison in nfs_compare_remount_data() fails.

Does the following patch fix the problem?

--------------------------------------------------------------------------
NFS: Fix port initialisation in nfs_remount()
From: Trond Myklebust <Trond.Myklebust@...>

The recent changeset 53a0b9c4c99ab0085a06421f71592722e5b3fd5f (NFS: Replace
nfs_parse_ip_address() with rpc_pton()) broke nfs_remount, since the call
to rpc_pton() will zero out the port number in data->nfs_server.address.

This is actually due to a bug in nfs_remount: it should be looking at the
port number in nfs_server.port instead...

Signed-off-by: Trond Myklebust <Trond.Myklebust@...>
---
(Continue reading)

Trond Myklebust | 5 Oct 2009 00:10
Picon

Re: [2.6.31] NFS4ERR_GRACE unhandled...

On Sat, 2009-10-03 at 16:59 +0100, Daniel J Blueman wrote:
> Hi Trond,
> 
> On Mon, Sep 28, 2009 at 7:16 PM, Trond Myklebust
> <Trond.Myklebust@...> wrote:
> > On Sat, 2009-09-26 at 19:14 +0100, Daniel J Blueman wrote:
> >> Hi Trond,
> >>
> >> After rebooting my 2.6.31 NFS4 server, I see a list of NFS kernel
> >> errors [1] on the 2.6.31 client corresponding to NFS4ERR_GRACE, so
> >> lock or file state recovery failed. Is this expected noting that I
> >> have an internal firewall allowing incoming TCP port 2049 on the
> >> server, and no firewall on the client, however I can't see how it can
> >> thus be callback related?
> >
> > No. It looks as if your server rebooted while the client was recovering
> > an expired lease.
> >
> > The following patch should prevent future occurrences of this bug...
> >
> > Cheers
> >  Trond
> > ------------------------------------------------------------------
> > NFSv4: Handle NFS4ERR_GRACE when recovering an expired lease.
> >
> > From: Trond Myklebust <Trond.Myklebust@...>
> >
> > If our lease expires, and the server subsequently reboot, we need to be
> > able to handle the case where the server refuses to let us recover state,
> > because it is in the grace period.
(Continue reading)

Daniel J Blueman | 5 Oct 2009 00:22
Picon

Re: [2.6.31] NFS4ERR_GRACE unhandled...

On Sun, Oct 4, 2009 at 11:10 PM, Trond Myklebust
<Trond.Myklebust@...> wrote:
> On Sat, 2009-10-03 at 16:59 +0100, Daniel J Blueman wrote:
>> Hi Trond,
>>
>> On Mon, Sep 28, 2009 at 7:16 PM, Trond Myklebust
>> <Trond.Myklebust@...> wrote:
>> > On Sat, 2009-09-26 at 19:14 +0100, Daniel J Blueman wrote:
>> >> Hi Trond,
>> >>
>> >> After rebooting my 2.6.31 NFS4 server, I see a list of NFS kernel
>> >> errors [1] on the 2.6.31 client corresponding to NFS4ERR_GRACE, so
>> >> lock or file state recovery failed. Is this expected noting that I
>> >> have an internal firewall allowing incoming TCP port 2049 on the
>> >> server, and no firewall on the client, however I can't see how it can
>> >> thus be callback related?
>> >
>> > No. It looks as if your server rebooted while the client was recovering
>> > an expired lease.
>> >
>> > The following patch should prevent future occurrences of this bug...
>> >
>> > Cheers
>> >  Trond
>> > ------------------------------------------------------------------
>> > NFSv4: Handle NFS4ERR_GRACE when recovering an expired lease.
>> >
>> > From: Trond Myklebust <Trond.Myklebust@...>
>> >
>> > If our lease expires, and the server subsequently reboot, we need to be
(Continue reading)

Trond Myklebust | 5 Oct 2009 00:33
Picon

Re: [PATCH] nfs: Avoid overrun when copying client IP address string

On Sun, 2009-10-04 at 14:25 +0100, Ben Hutchings wrote:
> As seen in <http://bugs.debian.org/549002>, nfs4_init_client() can
> overrun the source string when copying the client IP address from
> nfs_parsed_mount_data::client_address to nfs_client::cl_ipaddr.  Since
> these are both treated as null-terminated strings elsewhere, the copy
> should be done with strlcpy() not memcpy().
> 
> Signed-off-by: Ben Hutchings <ben@...>
> ---
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 75c9cd2..f525a2f 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
>  <at>  <at>  -1073,7 +1073,7  <at>  <at>  static int nfs4_init_client(struct nfs_client *clp,
>  				      1, flags & NFS_MOUNT_NORESVPORT);
>  	if (error < 0)
>  		goto error;
> -	memcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
> +	strlcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
>  
>  	error = nfs_idmap_new(clp);
>  	if (error < 0) {

It looks good, so I'll push it upstream. I assume the bug report also
applies to stable@...?

Thanks!

  Trond
--

-- 
(Continue reading)

Ben Hutchings | 5 Oct 2009 01:24
Picon

Re: Bug#549002: [PATCH] nfs: Avoid overrun when copying client IP address string

On Sun, 2009-10-04 at 18:33 -0400, Trond Myklebust wrote:
> On Sun, 2009-10-04 at 14:25 +0100, Ben Hutchings wrote:
> > As seen in <http://bugs.debian.org/549002>, nfs4_init_client() can
> > overrun the source string when copying the client IP address from
> > nfs_parsed_mount_data::client_address to nfs_client::cl_ipaddr.  Since
> > these are both treated as null-terminated strings elsewhere, the copy
> > should be done with strlcpy() not memcpy().
> > 
> > Signed-off-by: Ben Hutchings <ben@...>
> > ---
> > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> > index 75c9cd2..f525a2f 100644
> > --- a/fs/nfs/client.c
> > +++ b/fs/nfs/client.c
> >  <at>  <at>  -1073,7 +1073,7  <at>  <at>  static int nfs4_init_client(struct nfs_client *clp,
> >  				      1, flags & NFS_MOUNT_NORESVPORT);
> >  	if (error < 0)
> >  		goto error;
> > -	memcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
> > +	strlcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
> >  
> >  	error = nfs_idmap_new(clp);
> >  	if (error < 0) {
> 
> It looks good, so I'll push it upstream. I assume the bug report also
> applies to stable@...?

This bug appears to have been present "forever", though I think it has
become a practical problem since nfs_client::cl_ipaddr was enlarged to
48 bytes in v2.6.25.  So yes, the bug and fix are applicable to every
(Continue reading)


Gmane