1 Dec 2010 01:34
Re: Stale NFS file handles on 8.x amd64
Adam McDougall <mcdouga9 <at> egr.msu.edu>
2010-12-01 00:34:46 GMT
2010-12-01 00:34:46 GMT
On 11/30/10 09:33, John Baldwin wrote: > On Monday, November 29, 2010 8:06:54 pm Adam McDougall wrote: >> I've been running dovecot 1.1 on FreeBSD 7.x for a while with a bare >> minimum of NFS problems, but it got worse with 8.x. I have 2-4 servers >> (usually just 2) accessing mail on a Netapp over NFSv3 via imapd. >> delivery is via procmail which doesn't touch the dovecot metadata and >> webmail uses imapd. Client connections to imapd go to random servers >> and I don't yet have solid means to keep certain users on certain >> servers. I upgraded some of the servers to 8.x and dovecot 1.2 and ran >> into Stale NFS file handles causing index/uidlist corruption causing >> inboxes to appear as empty when they were not. In some situations their >> corrupt index had to be deleted manually. I first suspected dovecot 1.2 >> since it was upgraded at the same time but I downgraded to 1.1 and its >> doing the same thing. I don't really have a wealth of details to go on >> yet and I usually stay quiet until I do, and half the time it is >> difficult to reproduce myself so I've had to put it in production to get >> a feel for progress. This only happens a dozen or so times per weekday >> but I feel the need to start taking bigger steps. I'll probably do what >> I can to get IMAP back on a stable base (7.x?) and also try to debug 8.x >> on the remaining servers. A binary search is within possibility if I >> can reproduce the symptoms often enough even if I have to put a test >> server in production for a few hours. > > There were some changes to allow more concurrency in the NFS client in 8 (and > 7.2+) that caused ESTALE errors to occur on open(2) more frequently. You can > try setting 'vfs.lookup_shared=0' to disable the extra concurrency (but at a > performance cost) as a workaround. The most recent 7.x and 8.x have some > changes to open(2) to minimize ESTALE errors that I think get it back to the > same level as when lookup_shared is set to 0. >(Continue reading)
Adrian
On 1 December 2010 19:16, David DEMELIER <demelier.david <at> gmail.com> wrote:
> 2010/12/1 David DEMELIER <demelier.david <at> gmail.com>:
>> 2010/12/1 Adrian Chadd <adrian <at> freebsd.org>:
>>> On 1 December 2010 18:11, David Demelier <demelier.david <at> gmail.com> wrote:
>>>
>>>
>>>>> Or is this somehow running in 11A mode? Would you please paste
>>>>> 'ifconfig wlan0' here?
>>>>
>>>> markand <at> Melon ~ $ ifconfig wlan0
>>>> wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>>>> ether c4:17:fe:c4:14:b9
>>>> inet 130.79.183.186 netmask 0xfffffc00 broadcast 130.79.183.255
>>>> media: IEEE 802.11 Wireless Ethernet OFDM/36Mbps mode 11g
>>>> status: associated
>>>> ssid osiris-sec channel 11 (2462 MHz 11g) bssid 00:26:99:23:69:23
>>>> regdomain 106 indoor ecm authmode WPA2/802.11i privacy ON
>>>> deftxkey UNDEF TKIP 2:128-bit TKIP 3:128-bit txpower 20 bmiss 7
>>>> scanvalid 450 bgscan bgscanintvl 300 bgscanidle 250 roam:rssi 7
>>>> roam:rate 5 protmode CTS wme burst roaming MANUAL bintval 102
>>>
>>> Ok, so it's in 11bg mode. I'd have to do some digging to try and
RSS Feed